Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
Table of Contents
previousprevious proceeding |no next proceeding
SESSION: Keynote I
Ricardo Baeza-Yates
Understanding Human Language: Can NLP and Deep Learning Help?
Christopher Manning
Pages: 1-1
doi>10.1145/2911451.2926732
Full text: PDFPDF

There is a lot of overlap between the core problems of information retrieval (IR) and natural language processing (NLP). An IR system gains from understanding a user need and from understanding documents, and hence being able to determine whether a document ...
expand
SESSION: Keynote II
Susan Dumais
Big Data in Climate: Opportunities and Challenges for Machine Learning
Vipin Kumar
Pages: 3-3
doi>10.1145/2911451.2911550
Full text: PDFPDF

This talk will present an overview of research being done in a large interdisciplinary project on the development of novel data mining and machine learning approaches for analyzing massive amount of climate and ecosystem data now available from satellite ...
expand
SESSION: Evaluation I
Ben Carterette
Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015
Tetsuya Sakai
Pages: 5-14
doi>10.1145/2911451.2911492
Full text: PDFPDF

We conducted a systematic review of 840 SIGIR full papers and 215 TOIS papers published between 2006 and 2015. The original objective of the study was to identify IR effectiveness experiments that are seriously underpowered (i.e., the sample size is ...
expand
Bayesian Performance Comparison of Text Classifiers
Dell Zhang, Jun Wang, Emine Yilmaz, Xiaoling Wang, Yuxin Zhou
Pages: 15-24
doi>10.1145/2911451.2911547
Full text: PDFPDF

How can we know whether one classifier is really better than the other? In the area of text classification, since the publication of Yang and Liu's seminal SIGIR-1999 paper, it has become a standard practice for researchers to apply null-hypothesis significance ...
expand
A General Linear Mixed Models Approach to Study System Component Effects
Nicola Ferro, Gianmaria Silvello
Pages: 25-34
doi>10.1145/2911451.2911530
Full text: PDFPDF

Topic variance has a greater effect on performances than system variance but it cannot be controlled by system developers who can only try to cope with it. On the other hand, system variance is important on its own, since it is what system developers ...
expand
SESSION: Speech and Conversation Systems
Gareth Jones
Searching by Talking: Analysis of Voice Queries on Mobile Web Search
Ido Guy
Pages: 35-44
doi>10.1145/2911451.2911525
Full text: PDFPDF

The growing popularity of mobile search and the advancement in voice recognition technologies have opened the door for web search users to speak their queries, rather than type them. While this kind of voice search is still in its infancy, it is gradually ...
expand
Predicting User Satisfaction with Intelligent Assistants
Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos
Pages: 45-54
doi>10.1145/2911451.2911521
Full text: PDFPDF

There is a rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft's Cortana, Google Now, and Apple's Siri. They significantly change the way users interact with search systems, not only because ...
expand
Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System
Rui Yan, Yiping Song, Hua Wu
Pages: 55-64
doi>10.1145/2911451.2911542
Full text: PDFPDF

To establish an automatic conversation system between humans and computers is regarded as one of the most hardcore problems in computer science, which involves interdisciplinary techniques in information retrieval, natural language processing, artificial ...
expand
SESSION: Retrieval Models
Maarten de Rijke
Document Retrieval Using Entity-Based Language Models
Hadas Raviv, Oren Kurland, David Carmel
Pages: 65-74
doi>10.1145/2911451.2911508
Full text: PDFPDF

We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. ...
expand
Engineering Quality and Reliability in Technology-Assisted Review
Gordon V. Cormack, Maura R. Grossman
Pages: 75-84
doi>10.1145/2911451.2911510
Full text: PDFPDF

The objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how consistently ...
expand
A Sequential Decision Formulation of the Interface Card Model for Interactive IR
Yinan Zhang, Chengxiang Zhai
Pages: 85-94
doi>10.1145/2911451.2911543
Full text: PDFPDF

The Interface Card model is a promising new theoretical framework for modeling and optimizing interactive retrieval interfaces, but how to systematically instantiate it to solve concrete interface optimization problems remains an open challenge. We propose ...
expand
SESSION: Learning-to-rank
Mattew Lease
Generalized BROOF-L2R: A General Framework for Learning to Rank Based on Boosting and Random Forests
Clebson C.A. de Sá, Marcos A. Gonçalves, Daniel X. Sousa, Thiago Salles
Pages: 95-104
doi>10.1145/2911451.2911540
Full text: PDFPDF

The task of retrieving information that really matters to the users is considered hard when taking into consideration the current and increasingly amount of available information. To improve the effectiveness of this information seeking task, systems ...
expand
An Optimization Framework for Remapping and Reweighting Noisy Relevance Labels
Yury Ustinovskiy, Valentina Fedorova, Gleb Gusev, Pavel Serdyukov
Pages: 105-114
doi>10.1145/2911451.2911501
Full text: PDFPDF

Relevance labels is the essential part of any learning to rank framework. The rapid development of crowdsourcing platforms led to a significant reduction of the cost of manual labeling. This makes it possible to collect very large sets of labeled documents ...
expand
Learning to Rank with Selection Bias in Personal Search
Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork
Pages: 115-124
doi>10.1145/2911451.2911537
Full text: PDFPDF

Click-through data has proven to be a critical resource for improving search ranking quality. Though a large amount of click data can be easily collected by search engines, various biases make it difficult to fully leverage this type of data. In the ...
expand
SESSION: Music and Math
Jaap Kamps
On Effective Personalized Music Retrieval by Exploring Online User Behaviors
Zhiyong Cheng, Shen Jialie, Steven C.H. Hoi
Pages: 125-134
doi>10.1145/2911451.2911491
Full text: PDFPDF

In this paper, we study the problem of personalized text based music retrieval which takes users' music preferences on songs into account via the analysis of online listening behaviours and social tags. Towards the goal, a novel Dual-Layer Music Preference ...
expand
Semantification of Identifiers in Mathematics for Better Math Information Retrieval
Moritz Schubotz, Alexey Grigorev, Marcus Leich, Howard S. Cohl, Norman Meuschke, Bela Gipp, Abdou S. Youssef, Volker Markl
Pages: 135-144
doi>10.1145/2911451.2911503
Full text: PDFPDF

Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we ...
expand
Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale
Richard Zanibbi, Kenny Davila, Andrew Kane, Frank Wm. Tompa
Pages: 145-154
doi>10.1145/2911451.2911512
Full text: PDFPDF

When using a mathematical formula for search (query-by-expression), the suitability of retrieved formulae often depends more upon symbol identities and layout than deep mathematical semantics. Using a Symbol Layout Tree representation for formula appearance, ...
expand
SESSION: Microblog
Mark D. Smucker
Explainable User Clustering in Short Text Streams
Yukun Zhao, Shangsong Liang, Zhaochun Ren, Jun Ma, Emine Yilmaz, Maarten de Rijke
Pages: 155-164
doi>10.1145/2911451.2911522
Full text: PDFPDF

User clustering has been studied from different angles: behavior-based, to identify similar browsing or search patterns, and content-based, to identify shared interests. Once user clusters have been found, they can be used for recommendation and personalization. ...
expand
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, Zongyang Ma
Pages: 165-174
doi>10.1145/2911451.2911499
Full text: PDFPDF

For many applications that require semantic understanding of short texts, inferring discriminative and coherent latent topics from short texts is a critical and fundamental task. Conventional topic models largely rely on word co-occurrences to derive ...
expand
Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams
Xin Qian, Jimmy Lin, Adam Roegiest
Pages: 175-184
doi>10.1145/2911451.2911494
Full text: PDFPDF

We propose and validate a novel interleaved evaluation methodology for two complementary information seeking tasks on document streams: retrospective summarization and prospective notification. In the first, the user desires relevant and non-redundant ...
expand
SESSION: Web Search
David Hawking
Learning Query and Document Relevance from a Web-scale Click Graph
Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Jr., Dawei Yin, Yi Chang, Chengxiang Zhai
Pages: 185-194
doi>10.1145/2911451.2911531
Full text: PDFPDF

Click-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and ...
expand
Click-based Hot Fixes for Underperforming Torso Queries
Masrour Zoghi, Tomáš Tunys, Lihong Li, Damien Jose, Junyan Chen, Chun Ming Chin, Maarten de Rijke
Pages: 195-204
doi>10.1145/2911451.2911500
Full text: PDFPDF

Ranking documents using their historical click-through rate (CTR) can improve relevance for frequently occurring queries, i.e., so-called head queries. It is difficult to use such click signals on non-head queries as they receive fewer clicks. In this ...
expand
Best Student Paper A Context-aware Time Model for Web Search
Alexey Borisov, Ilya Markov, Maarten de Rijke, Pavel Serdyukov
Pages: 205-214
doi>10.1145/2911451.2911504
Full text: PDFPDF

In web search, information about times between user actions has been shown to be a good indicator of users' satisfaction with the search results. Existing work uses the mean values of the observed times, or fits probability distributions to the observed ...
expand
SESSION: Question Answering
Hideo Joho
Novelty based Ranking of Human Answers for Community Questions
Adi Omari, David Carmel, Oleg Rokhlenko, Idan Szpektor
Pages: 215-224
doi>10.1145/2911451.2911506
Full text: PDFPDF

Questions and their corresponding answers within a community based question answering (CQA) site are frequently presented as top search results forWeb search queries and viewed by millions of searchers daily. The number of answers for CQA questions ranges ...
expand
That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search
Boaz Petersil, Avihai Mejer, Idan Szpektor, Koby Crammer
Pages: 225-234
doi>10.1145/2911451.2911496
Full text: PDFPDF

A fundamental task in Information Retrieval (IR) is term weighting. Early IR theory considered both the presence or absence of all terms in the lexicon for ranking and needed to weight them all. Yet, as the size of lexicons grew and models became too ...
expand
When a Knowledge Base Is Not Enough: Question Answering over Knowledge Bases with External Text Data
Denis Savenkov, Eugene Agichtein
Pages: 235-244
doi>10.1145/2911451.2911536
Full text: PDFPDF

One of the major challenges for automated question answering over Knowledge Bases (KBQA) is translating a natural language question to the Knowledge Base (KB) entities and predicates. Previous systems have used a limited amount of training data to learn ...
expand
SESSION: Learning
Emine Yilmaz
Transfer Learning for Cross-Lingual Sentiment Classification with Weakly Shared Deep Neural Networks
Guangyou Zhou, Zhao Zeng, Jimmy Xiangji Huang, Tingting He
Pages: 245-254
doi>10.1145/2911451.2911490
Full text: PDFPDF

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual ...
expand
Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars
Ke Zhai, Zornitsa Kozareva, Yuening Hu, Qi Li, Weiwei Guo
Pages: 255-264
doi>10.1145/2911451.2911495
Full text: PDFPDF

Web search queries provide a surprisingly large amount of information, which can be potentially organized and converted into a knowledgebase. In this paper, we focus on the problem of automatically identifying brand and product entities from a large ...
expand
Learning for Efficient Supervised Query Expansion via Two-stage Feature Selection
Zhiwei Zhang, Qifan Wang, Luo Si, Jianfeng Gao
Pages: 265-274
doi>10.1145/2911451.2911539
Full text: PDFPDF

Query expansion (QE) is a well known technique to improve retrieval effectiveness, which expands original queries with extra terms that are predicted to be relevant. A recent trend in the literature is Supervised Query Expansion (SQE), where supervised ...
expand
SESSION: Efficiency I
Alistair Moffat
Leveraging Context-Free Grammar for Efficient Inverted Index Compression
Zhaohua Zhang, Jiancong Tong, Haibing Huang, Jin Liang, Tianlong Li, Rebecca J. Stones, Gang Wang, Xiaoguang Liu
Pages: 275-284
doi>10.1145/2911451.2911518
Full text: PDFPDF

Large-scale search engines need to answer thousands of queries per second over billions of documents, which is typically done by querying a large inverted index. Many highly optimized integer encoding techniques are applied to compress the inverted index ...
expand
Fast and Compact Hamming Distance Index
Simon Gog, Rossano Venturini
Pages: 285-294
doi>10.1145/2911451.2911523
Full text: PDFPDF

Searching for similar objects in a collection is a core task of many applications in databases, pattern recognition, and information retrieval. As there exist similarity-preserving hash functions like SimHash, indexing these objects reduces to the solution ...
expand
Fast First-Phase Candidate Generation for Cascading Rankers
Qi Wang, Constantinos Dimopoulos, Torsten Suel
Pages: 295-304
doi>10.1145/2911451.2911515
Full text: PDFPDF

Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or ...
expand
SESSION: Recommendation Systems I
Oren Kurland
Learning to Rank Features for Recommendation over Multiple Categories
Xu Chen, Zheng Qin, Yongfeng Zhang, Tao Xu
Pages: 305-314
doi>10.1145/2911451.2911549
Full text: PDFPDF

Incorporating phrase-level sentiment analysis on users' textual reviews for recommendation has became a popular meth-od due to its explainable property for latent features and high prediction accuracy. However, the inherent limitations of the existing ...
expand
How Much Novelty is Relevant?: It Depends on Your Curiosity
Pengfei Zhao, Dik Lun Lee
Pages: 315-324
doi>10.1145/2911451.2911488
Full text: PDFPDF

Traditional recommendation systems (RS's) aim to recommend items that are relevant to the user's interest. Unfortunately, the recommended items will soon become too familiar to the user and hence fail to arouse her interest. Discovery-oriented recommendation ...
expand
Discrete Collaborative Filtering
Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, Tat-Seng Chua
Pages: 325-334
doi>10.1145/2911451.2911502
Full text: PDFPDF

We address the efficiency problem of Collaborative Filtering (CF) by hashing users and items as latent vectors in the form of binary codes, so that user-item affinity can be efficiently calculated in a Hamming space. However, existing hashing methods ...
expand
SESSION: User Needs
Diane Kelly
Best Paper Understanding Information Need: An fMRI Study
Yashar Moshfeghi, Peter Triantafillou, Frank E. Pollick
Pages: 335-344
doi>10.1145/2911451.2911534
Full text: PDFPDF

The raison d'etre of IR is to satisfy human information need. But, do we really understand information need? Despite advances in the past few decades in both the IR and relevant scientific communities, this question is largely unanswered. We do not really ...
expand
User Behavior in Asynchronous Slow Search
Ryan Burton, Kevyn Collins-Thompson
Pages: 345-354
doi>10.1145/2911451.2911541
Full text: PDFPDF

Conventional Web search is predicated on returning results to users as quickly as possible. However, for some search tasks, users have reported a willingness to wait for the perfect set of results. In this work, we present the first study to analyze ...
expand
Going back in Time: An Investigation of Social Media Re-finding
Florian Meier, David Elsweiler
Pages: 355-364
doi>10.1145/2911451.2911524
Full text: PDFPDF

Social Media (SM) has become a valuable information source to many in diverse situations. In IR, research has focused on real-time aspects and as such little is known about how long SM content is of value to users, if and how often it is re-accessed, ...
expand
SESSION: Privacy, Advertising, and Products
Grace Hui Yang
R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities
Joanna Asia Biega, Krishna P. Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum
Pages: 365-374
doi>10.1145/2911451.2911533
Full text: PDFPDF

Privacy of Internet users is at stake because they expose personal information in posts created in online communities, in search queries, and other activities. An adversary that monitors a community may identify the users with the most sensitive properties ...
expand
Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens
Pages: 375-384
doi>10.1145/2911451.2911538
Full text: PDFPDF

Sponsored search represents a major source of revenue for web search engines. The advertising model brings a unique possibility for advertisers to target direct user intent communicated through a search query, usually done by displaying their ads alongside ...
expand
Retrieving Non-Redundant Questions to Summarize a Product Review
Mengwen Liu, Yi Fang, Dae Hoon Park, Xiaohua Hu, Zhengtao Yu
Pages: 385-394
doi>10.1145/2911451.2911544
Full text: PDFPDF

Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who ...
expand
SESSION: Novelty and Diversity
Charlie L.A. Clarke
Modeling Document Novelty with Neural Tensor Network for Search Result Diversification
Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
Pages: 395-404
doi>10.1145/2911451.2911498
Full text: PDFPDF

Search result diversification has attracted considerable attention as a means to tackle the ambiguous or multi-faceted information needs of users. One of the key problems in search result diversification is novelty, that is, how to measure the novelty ...
expand
ScentBar: A Query Suggestion Interface Visualizing the Amount of Missed Relevant Information for Intrinsically Diverse Search
Kazutoshi Umemoto, Takehiro Yamamoto, Katsumi Tanaka
Pages: 405-414
doi>10.1145/2911451.2911546
Full text: PDFPDF

For intrinsically diverse tasks, in which collecting extensive information from different aspects of a topic is required, searchers often have difficulty formulating queries to explore diverse aspects and deciding when to stop searching. With the goal ...
expand
Evaluating Search Result Diversity using Intent Hierarchies
Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, Ji-Rong Wen
Pages: 415-424
doi>10.1145/2911451.2911497
Full text: PDFPDF

Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships ...
expand
SESSION: Entities and Knowledge Graphs
Jamie Callan
Robust and Collective Entity Disambiguation through Semantic Embeddings
Stefan Zwicklbauer, Christin Seifert, Michael Granitzer
Pages: 425-434
doi>10.1145/2911451.2911535
Full text: PDFPDF

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but ...
expand
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from Knowledge Graph
Fedor Nikolaev, Alexander Kotov, Nikita Zhiltsov
Pages: 435-444
doi>10.1145/2911451.2911545
Full text: PDFPDF

Accurate projection of terms in free-text queries onto structured entity representations is one of the fundamental problems in entity retrieval from knowledge graphs. In this paper, we demonstrate that existing retrieval models for ad-hoc structured ...
expand
Hierarchical Random Walk Inference in Knowledge Graphs
Qiao Liu, Liuyi Jiang, Minghao Han, Yao Liu, Zhiguang Qin
Pages: 445-454
doi>10.1145/2911451.2911509
Full text: PDFPDF

Relational inference is a crucial technique for knowledge base population. The central problem in the study of relational inference is to infer unknown relations between entities from the facts given in the knowledge bases. Two popular models have been ...
expand
SESSION: SIRIP I: Big companies, big data
Gilad Mishne
When Watson Went to Work: Leveraging Cognitive Computing in the Real World
Aya Soffer, David Konopnicki, Haggai Roitman
Pages: 455-456
doi>10.1145/2911451.2926724
Full text: PDFPDF
Ask Your TV: Real-Time Question Answering with Recurrent Neural Networks
Ferhan Ture, Oliver Jojic
Pages: 457-458
doi>10.1145/2911451.2926729
Full text: PDFPDF

Voice-based interfaces are very popular in today's world, and Comcast customers are no exception. Usage stats show that our new X1 TV platform receives millions of voice queries per day. As a result, expanding the coverage of our voice interface provides ...
expand
Amazon Search: The Joy of Ranking Products
Daria Sorokina, Erick Cantu-Paz
Pages: 459-460
doi>10.1145/2911451.2926725
Full text: PDFPDF

Amazon is one of the world's largest e-commerce sites and Amazon Search powers the majority of Amazon's sales. As a consequence, even small improvements in relevance ranking both positively influence the shopping experience of millions of customers and ...
expand
Learning to Rank Personalized Search Results in Professional Networks
Viet Ha-Thuc, Shakti Sinha
Pages: 461-462
doi>10.1145/2911451.2927018
Full text: PDFPDF

LinkedIn search is deeply personalized - for the same queries, different searchers expect completely different results. This paper presents our approach to achieving this by mining various data sources available in LinkedIn to infer searchers' intents ...
expand
SESSION: Evaluation II
Tetsuya Sakai
When does Relevance Mean Usefulness and User Satisfaction in Web Search?
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, Hengliang Luo
Pages: 463-472
doi>10.1145/2911451.2911507
Full text: PDFPDF

Relevance is a fundamental concept in information retrieval (IR) studies. It is however often observed that relevance as annotated by secondary assessors may not necessarily mean usefulness and satisfaction perceived by users. In this study, we confirm ...
expand
How Many Workers to Ask?: Adaptive Exploration for Collecting High Quality Labels
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, Aleksandrs Slivkins
Pages: 473-482
doi>10.1145/2911451.2911514
Full text: PDFPDF

Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number ...
expand
Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines
B. Taner Dinçer, Craig Macdonald, Iadh Ounis
Pages: 483-492
doi>10.1145/2911451.2911511
Full text: PDFPDF

A robust retrieval system ensures that user experience is not damaged by the presence of poorly-performing queries. Such robustness can be measured by risk-sensitive evaluation measures, which assess the extent to which a system performs worse than a ...
expand
SESSION: Events
Fernando Diaz
Event Digest: A Holistic View on Past Events
Arunav Mishra, Klaus Berberich
Pages: 493-502
doi>10.1145/2911451.2911526
Full text: PDFPDF

For a general user, easy access to vast amounts of online information available on past events has made retrospection much harder. We propose a problem of automatic event digest generation to aid effective and efficient retrospection. For this, in addition ...
expand
Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events
Andreas Spitz, Michael Gertz
Pages: 503-512
doi>10.1145/2911451.2911529
Full text: PDFPDF

Real world events, such as historic incidents, typically contain both spatial and temporal aspects and involve a specific group of persons. This is reflected in the descriptions of events in textual sources, which contain mentions of named entities and ...
expand
GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams
Chao Zhang, Guangyu Zhou, Quan Yuan, Honglei Zhuang, Yu Zheng, Lance Kaplan, Shaowen Wang, Jiawei Han
Pages: 513-522
doi>10.1145/2911451.2911519
Full text: PDFPDF

The real-time discovery of local events (e.g., protests, crimes, disasters) is of great importance to various applications, such as crime monitoring, disaster alarming, and activity recommendation. While this task was nearly impossible years ago due ...
expand
SESSION: SIRIP II: Small companies, big ideas
Gilad Mishne
Building a Self-Learning Search Engine: From Research to Business
Manos Tsagkias, Wouter Weerkamp
Pages: 523-524
doi>10.1145/2911451.2926728
Full text: PDFPDF

904Labs B.V. was founded in 2014 by Wouter Weerkamp, Manos Tsagkias, and Maarten de Rijke to commercialize state-of-the-art search engine technology. 904Labs' strategic product is a self-learning search engine for online retailers, which uses some of ...
expand
Sedano: A News Stream Processor for Business
Ugo Scaiella, Giacomo Berardi, Giuliano Mega, Roberto Santoro
Pages: 525-526
doi>10.1145/2911451.2926730
Full text: PDFPDF

We present Sedano, a system for processing and indexing a continuous stream of business-related news. Sedano defines pipelines whose stages analyze and enrich news items (e.g., newspaper articles and press releases). News data coming from several content ...
expand
Ranking Financial Tweets
Diego Ceccarelli, Francesco Nidito, Miles Osborne
Pages: 527-528
doi>10.1145/2911451.2926727
Full text: PDFPDF

Recently Twitter has complemented traditional newswire as a source of valuable Financial information. Although there is a rich body of published research dealing with the task of ranking tweets, there has been little published research dealing with ranking ...
expand
SESSION: Recommendation Systems II
Josiane Mothe
Contextual Bandits in a Collaborative Environment
Qingyun Wu, Huazheng Wang, Quanquan Gu, Hongning Wang
Pages: 529-538
doi>10.1145/2911451.2911528
Full text: PDFPDF

Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as ...
expand
Collaborative Filtering Bandits
Shuai Li, Alexandros Karatzoglou, Claudio Gentile
Pages: 539-548
doi>10.1145/2911451.2911548
Full text: PDFPDF

Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational ...
expand
Fast Matrix Factorization for Online Recommendation with Implicit Feedback
Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua
Pages: 549-558
doi>10.1145/2911451.2911489
Full text: PDFPDF

This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing ...
expand
SESSION: Image and Multimodal Search
Gabriella Pasi
Leveraging User Interaction Signals for Web Image Search
Neil O'Hare, Paloma de Juan, Rossano Schifanella, Yunlong He, Dawei Yin, Yi Chang
Pages: 559-568
doi>10.1145/2911451.2911532
Full text: PDFPDF

User interfaces for web image search engine results differ significantly from interfaces for traditional (text) web search results, supporting a richer interaction. In particular, users can see an enlarged image preview by hovering over a result image, ...
expand
Self-Paced Cross-Modal Subspace Matching
Jian Liang, Zhihang Li, Dong Cao, Ran He, Jingdong Wang
Pages: 569-578
doi>10.1145/2911451.2911527
Full text: PDFPDF

Cross-modal matching methods match data from different modalities according to their similarities. Most existing methods utilize label information to reduce the semantic gap between different modalities. However, it is usually time-consuming to manually ...
expand
Composite Correlation Quantization for Efficient Multimodal Retrieval
Mingsheng Long, Yue Cao, Jianmin Wang, Philip S. Yu
Pages: 579-588
doi>10.1145/2911451.2911493
Full text: PDFPDF

Efficient similarity retrieval from large-scale multimodal database is pervasive in modern search engines and social networks. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. ...
expand
SESSION: SIRIP III: Modeling and Evaluation
Jussi Karlgren
Principles for the Design of Online A/B Metrics
Widad Machmouchi, Georg Buscher
Pages: 589-590
doi>10.1145/2911451.2926731
Full text: PDFPDF

In this paper, we describe principles for designing metrics in the context of A/B experiments. We share some issues that comes up in designing such experiments and provide solutions to avoid such pitfalls.
expand
Visual Recommendation Use Case for an Online Marketplace Platform: allegro.pl
Anna Wróblewska, Łukasz Rączkowski
Pages: 591-594
doi>10.1145/2911451.2926722
Full text: PDFPDF

In this paper we describe a small content-based visual recommendation project built as part of the Allegro online marketplace platform. We extracted relevant data only from images, as they are inherently better at capturing visual attributes than textual ...
expand
AOL's Named Entity Resolver: Solving Disambiguation via Document Strongly Connected Components and Ad-Hoc Edges Construction
Roni Wiener, Yonatan Ben-Simhon, Anna Chen
Pages: 595-596
doi>10.1145/2911451.2926721
Full text: PDFPDF

Named Entity Disambiguation is the task of disambiguating named entity mentions in unstructured text and linking them to their corresponding entries in a large knowledge base such as Freebase. Practically, each text match in a given document should be ...
expand
The Data Stack in Information Retrieval
Omar Alonso
Pages: 597-597
doi>10.1145/2911451.2926726
Full text: PDFPDF

I propose to look at information retrieval applications from the perspective of the data stack infrastructure that is needed in research prototypes and production systems.
expand
SESSION: Behavior Models and Applications
David Elsweiler
Predicting User Engagement with Direct Displays Using Mouse Cursor Information
Ioannis Arapakis, Luis A. Leiva
Pages: 599-608
doi>10.1145/2911451.2911505
Full text: PDFPDF

Predicting user engagement with direct displays (DD) is of paramount importance to commercial search engines, as well as to search performance evaluation. However, understanding within-content engagement on a web page is not a trivial task mainly because ...
expand
Search Result Prefetching Using Cursor Movement
Fernando Diaz, Qi Guo, Ryen W. White
Pages: 609-618
doi>10.1145/2911451.2911516
Full text: PDFPDF

Search result examination is an important part of searching. High page load latency for landing pages (clicked results) can reduce the efficiency of the search process. Proactively prefetching landing pages in advance of clickthrough can save searchers ...
expand
Predicting Search User Examination with Visual Saliency
Yiqun Liu, Zeyang Liu, Ke Zhou, Meng Wang, Huanbo Luan, Chao Wang, Min Zhang, Shaoping Ma
Pages: 619-628
doi>10.1145/2911451.2911517
Full text: PDFPDF

Predicting users' examination of search results is one of the key concerns in Web search related studies. With more and more heterogeneous components federated into search engine result pages (SERPs), it becomes difficult for traditional position-based ...
expand
SESSION: Efficiency II
Rossano Venturini
A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation
Xin Jin, Tao Yang, Xun Tang
Pages: 629-638
doi>10.1145/2911451.2911520
Full text: PDFPDF

Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. The previous ...
expand
Improved Caching Techniques for Large-Scale Image Hosting Services
Xiao Bai, B. Barla Cambazoglu, Archie Russell
Pages: 639-648
doi>10.1145/2911451.2911513
Full text: PDFPDF

Commercial image serving systems, such as Flickr and Facebook, rely on large image caches to avoid the retrieval of requested images from the costly backend image store, as much as possible. Such systems serve the same image in different resolutions ...
expand
SESSION: Short Collection Papers
A Complete & Comprehensive Movie Review Dataset (CCMR)
Xuezhi Cao, Weiyue Huang, Yong Yu
Pages: 661-664
doi>10.1145/2911451.2914669
Full text: PDFPDF

Online review sites are widely used for various domains including movies and restaurants. These sites now have strong influences towards users during purchasing processes. There exist plenty of research works for review sites on various aspects, including ...
expand
A Cross-Platform Collection of Social Network Profiles
Maria Han Veiga, Carsten Eickhoff
Pages: 665-668
doi>10.1145/2911451.2914666
Full text: PDFPDF

The proliferation of Internet-enabled devices and services has led to a shifting balance between digital and analogue aspects of our everyday lives. In the face of this development there is a growing demand for the study of privacy hazards, the potential ...
expand
A Test Collection for Matching Patients to Clinical Trials
Bevan Koopman, Guido Zuccon
Pages: 669-672
doi>10.1145/2911451.2914672
Full text: PDFPDF

We present a test collection to study the use of search engines for matching eligible patients (the query) to clinical trials (the document). Clinical trials are experiments conducted in the development of new medical treatments, drugs or devices. Recruiting ...
expand
ArabicWeb16: A New Crawl for Today's Arabic Web
Reem Suwaileh, Mucahid Kutlu, Nihal Fathima, Tamer Elsayed, Matthew Lease
Pages: 673-676
doi>10.1145/2911451.2914677
Full text: PDFPDF

Web crawls provide valuable snapshots of the Web which enable a wide variety of research, be it distributional analysis to characterize Web properties or use of language, content analysis in social science, or Information Retrieval (IR) research to develop ...
expand
Building Test Collections for Evaluating Temporal IR
Hideo Joho, Adam Jatowt, Roi Blanco, Haitao Yu, Shuhei Yamamoto
Pages: 677-680
doi>10.1145/2911451.2914673
Full text: PDFPDF

Research on temporal aspects of information retrieval has recently gained considerable interest within the Information Retrieval (IR) community. This paper describes our efforts for building test collections for the purpose of fostering temporal IR research. ...
expand
DAJEE: A Dataset of Joint Educational Entities for Information Retrieval in Technology Enhanced Learning
Vladimir Estivill-Castro, Carla Limongelli, Matteo Lombardi, Alessandro Marani
Pages: 681-684
doi>10.1145/2911451.2914670
Full text: PDFPDF

In the Technology Enhanced Learning (TEL) community, the problem of conducting reproducible evaluations of recommender systems is still open, due to the lack of exhaustive benchmarks. The few public datasets available in TEL have limitations, being mostly ...
expand
Evaluating Retrieval over Sessions: The TREC Session Track 2011-2014
Ben Carterette, Paul Clough, Mark Hall, Evangelos Kanoulas, Mark Sanderson
Pages: 685-688
doi>10.1145/2911451.2914675
Full text: PDFPDF

Information Retrieval (IR) research has traditionally focused on serving the best results for a single query - so-called ad hoc retrieval. However, users typically search iteratively, refining and reformulating their queries during a session. A key challenge ...
expand
EveTAR: A New Test Collection for Event Detection in Arabic Tweets
Hind Almerekhi, Maram Hasanain, Tamer Elsayed
Pages: 689-692
doi>10.1145/2911451.2914681
Full text: PDFPDF

Research on event detection in Twitter is often obstructed by the lack of publicly-available evaluation mechanisms such as test collections; this problem is more severe when considering the scarcity of them in languages other than English. In this paper, ...
expand
GNMID14: A Collection of 110 Million Global Music Identification Matches
Cameron Summers, Greg Tronel, Jason Cramer, Aneesh Vartakavi, Phillip Popp
Pages: 693-696
doi>10.1145/2911451.2914679
Full text: PDFPDF

A new dataset is presented composed of music identification matches from Gracenote, a leading global music metadata company. Matches from January 1, 2014 to December 31, 2014 have been curated and made available as a public dataset called Gracenote Music ...
expand
Longitudinal Navigation Log Data on a Large Web Domain
Suzan Verberne, Bram Arends, Wessel Kraaij, Arjen de Vries
Pages: 697-700
doi>10.1145/2911451.2914667
Full text: PDFPDF

We have collected the access logs for our university's web domain over a time span of 4.5 years. We now release the pre-processed data of a 3-month period for research into user navigation behavior. We preprocessed the data so that only successful GET ...
expand
New Collection Announcement: Focused Retrieval Over the Web
Ivan Habernal, Maria Sukhareva, Fiana Raiber, Anna Shtok, Oren Kurland, Hadar Ronen, Judit Bar-Ilan, Iryna Gurevych
Pages: 701-704
doi>10.1145/2911451.2914682
Full text: PDFPDF

Focused retrieval (a.k.a., passage retrieval) is important at its own right and as an intermediate step in question answering systems. We present a new Web-based collection for focused retrieval. The document corpus is the Category A of the ClueWeb12 ...
expand
NTCIR Lifelog: The First Test Collection for Lifelog Research
Cathal Gurrin, Hideo Joho, Frank Hopfgartner, Liting Zhou, Rami Albatal
Pages: 705-708
doi>10.1145/2911451.2914680
Full text: PDFPDF

Test collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. ...
expand
SOGOU-2012-CRAWL: A Crawl of Search Results in the Sogou 2012 Chinese Query Log
Stewart Whiting, Joemon M. Jose, Omar Alonso
Pages: 709-712
doi>10.1145/2911451.2914668
Full text: PDFPDF

In 2012, Sogou, a major Chinese web search engine released a large-scale query log containing 43.5M user interactions, including submitted queries and clicked web page search results. This query log offers a deep sample of queries over a two day period ...
expand
The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums
Ian Soboroff, Kira Griffitt, Stephanie Strassel
Pages: 713-716
doi>10.1145/2911451.2914674
Full text: PDFPDF

This paper describes a new test collection for passage retrieval from multilingual, informal text. The task being modeled is that of a monolingual English-speaking user who wishes to search discussion forum text in a foreign language. The system retrieves ...
expand
The Factoid Queries Collection
Ido Guy, Dan Pelleg
Pages: 717-720
doi>10.1145/2911451.2914676
Full text: PDFPDF

We present a collection of over 15,000 queries, issued to commercial web search engines, whose answer is a single fact. The collection was produced based on queries landing on questions within a large community question answering website, each with a ...
expand
The LExR Collection for Expertise Retrieval in Academia
Vitor Mangaravite, Rodrygo L.T. Santos, Isac S. Ribeiro, Marcos André Gonçalves, Alberto H.F. Laender
Pages: 721-724
doi>10.1145/2911451.2914678
Full text: PDFPDF

Expertise retrieval has been the subject of intense research over the past decade, particularly with the public availability of benchmark test collections for expertise retrieval in enterprises. Another domain which has seen comparatively less research ...
expand
UQV100: A Test Collection with Query Variability
Peter Bailey, Alistair Moffat, Falk Scholer, Paul Thomas
Pages: 725-728
doi>10.1145/2911451.2914671
Full text: PDFPDF

We describe the UQV100 test collection, designed to incorporate variability from users. Information need ?backstories? were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, ...
expand
SESSION: Short Research Papers
A Dynamic Recurrent Model for Next Basket Recommendation
Feng Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
Pages: 729-732
doi>10.1145/2911451.2914683
Full text: PDFPDF

Next basket recommendation becomes an increasing concern. Most conventional models explore either sequential transaction features or general interests of users. Further, some works treat users' general interests and sequential behaviors as two totally ...
expand
A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling
Fanghong Jian, Jimmy Xiangji Huang, Jiashu Zhao, Tingting He, Po Hu
Pages: 733-736
doi>10.1145/2911451.2914748
Full text: PDFPDF

Traditional information retrieval (IR) models, in which a document is normally represented as a bag of words and their frequencies, capture the term-level and document-level information. Topic models, on the other hand, discover semantic topic-based ...
expand
An Empirical Study of Learning to Rank for Entity Search
Jing Chen, Chenyan Xiong, Jamie Callan
Pages: 737-740
doi>10.1145/2911451.2914725
Full text: PDFPDF

This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. ...
expand
An Exploration of Evaluation Metrics for Mobile Push Notifications
Luchen Tan, Adam Roegiest, Jimmy Lin, Charles L.A. Clarke
Pages: 741-744
doi>10.1145/2911451.2914694
Full text: PDFPDF

How do we evaluate systems that filter social media streams and send users updates via push notifications on their mobile phones? Such notifications must be relevant, timely, and novel. In this paper, we explore various evaluation metrics for this task, ...
expand
An Improved Multileaving Algorithm for Online Ranker Evaluation
Brian Brost, Ingemar J. Cox, Yevgeny Seldin, Christina Lioma
Pages: 745-748
doi>10.1145/2911451.2914706
Full text: PDFPDF

Online ranker evaluation is a key challenge in information retrieval. An important task in the online evaluation of rankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have been found to be efficient ...
expand
An Unsupervised Approach to Anomaly Detection in Music Datasets
Yen-Cheng Lu, Chih-Wei Wu, Chang-Tien Lu, Alexander Lerch
Pages: 749-752
doi>10.1145/2911451.2914700
Full text: PDFPDF

This paper presents an unsupervised method for systematically identifying anomalies in music datasets. The model integrates categorical regression and robust estimation techniques to infer anomalous scores in music clips. When applied to a music genre ...
expand
Anonymizing Query Logs by Differential Privacy
Sicong Zhang, Hui Yang, Lisa Singh
Pages: 753-756
doi>10.1145/2911451.2914732
Full text: PDFPDF

Query logs are valuable resources for Information Retrieval (IR) research. However, because they are also rich in private and personal information, the huge concern of leaking user privacy prevents query logs from being shared from the search companies ...
expand
Audio Features Affected by Music Expressiveness: Experimental Setup and Preliminary Results on Tuba Players
Alberto Introini, Giorgio Presti, Giuseppe Boccignone
Pages: 757-760
doi>10.1145/2911451.2914690
Full text: PDFPDF

Within a Music Information Retrieval perspective, the goal of the study presented here is to investigate the impact on sound features of the musician's affective intention, namely when trying to intentionally convey emotional contents via expressiveness. ...
expand
Automatic Identification and Contextual Reformulation of Implicit System-Related Queries
Adam Fourney, Susan T. Dumais
Pages: 761-764
doi>10.1145/2911451.2914701
Full text: PDFPDF

Web search functionality is increasingly integrated into operating systems, software applications, and other interactive environments that extend beyond the traditional web browser. In particular, intelligent virtual assistants (e.g., Microsoft Cortana ...
expand
Axiomatic Analysis for Improving the Log-Logistic Feedback Model
Ali Montazeralghaem, Hamed Zamani, Azadeh Shakery
Pages: 765-768
doi>10.1145/2911451.2914768
Full text: PDFPDF

Pseudo-relevance feedback (PRF) has been proven to be an effective query expansion strategy to improve retrieval performance. Several PRF methods have so far been proposed for many retrieval models. Recent theoretical studies of PRF methods show that ...
expand
Balancing Relevance Criteria through Multi-Objective Optimization
Joost van Doorn, Daan Odijk, Diederik M. Roijers, Maarten de Rijke
Pages: 769-772
doi>10.1145/2911451.2914708
Full text: PDFPDF

Offline evaluation of information retrieval systems typically focuses on a single effectiveness measure that models the utility for a typical user. Such a measure usually combines a behavior-based rank discount with a notion of document utility that ...
expand
Build Emotion Lexicon from the Mood of Crowd via Topic-Assisted Joint Non-negative Matrix Factorization
Kaisong Song, Wei Gao, Ling Chen, Shi Feng, Daling Wang, Chengqi Zhang
Pages: 773-776
doi>10.1145/2911451.2914759
Full text: PDFPDF

In the research of building emotion lexicons, we witness the exploitation of crowd-sourced affective annotation given by readers of online news articles. Such approach ignores the relationship between topics and emotion expressions which are often closely ...
expand
Burst Detection in Social Media Streams for Tracking Interest Profiles in Real Time
Cody Buntain, Jimmy Lin
Pages: 777-780
doi>10.1145/2911451.2914733
Full text: PDFPDF

This work presents RTTBurst, an end-to-end system for ingesting descriptions of user interest profiles and discovering new and relevant tweets based on those interest profiles using a simple model for identifying bursts in token usage. Our approach differs ...
expand
Cluster-based Joint Matrix Factorization Hashing for Cross-Modal Retrieval
Dimitrios Rafailidis, Fabio Crestani
Pages: 781-784
doi>10.1145/2911451.2914710
Full text: PDFPDF

Cross-modal retrieval has been an emerging topic over the last years, as modern applications have to efficiently search for multimedia documents with different modalities. In this study, we propose a cross-modal hashing method by following a cluster-based ...
expand
Collaborative Ranking with Social Relationships for Top-N Recommendations
Dimitrios Rafailidis, Fabio Crestani
Pages: 785-788
doi>10.1145/2911451.2914711
Full text: PDFPDF

Recommendation systems have gained a lot of attention because of their importance for handling the unprecedentedly large amount of available content on the Web, such as movies, music, books, etc. Although Collaborative Ranking (CR) models can produce ...
expand
Community-based Cyberreading for Information Understanding
Zhuoren Jiang, Xiaozhong Liu, Liangcai Gao, Zhi Tang
Pages: 789-792
doi>10.1145/2911451.2914744
Full text: PDFPDF

Although the content in scientific publications is increasingly challenging, it is necessary to investigate another important problem, that of scientific information understanding. For this proposed problem, we investigate novel methods to assist scholars ...
expand
Computational Creativity Based Video Recommendation
Wei Lu, Fu-lai Chung
Pages: 793-796
doi>10.1145/2911451.2914707
Full text: PDFPDF

Computational creativity, as an emerging domain of application, emphasizes the use of big data to automatically design new knowledge. Based on the availability of complex multi-relational data, one aspect of computational creativity is to infer unexplored ...
expand
Controversy Detection in Wikipedia Using Collective Classification
Shiri Dori-Hacohen, David Jensen, James Allan
Pages: 797-800
doi>10.1145/2911451.2914745
Full text: PDFPDF

Concerns over personalization in IR have sparked an interest in detection and analysis of controversial topics. Accurate detection would enable many beneficial applications, such as alerting search users to controversy. Wikipedia's broad coverage and ...
expand
Discovering Author Interest Evolution in Topic Modeling
Min Yang, Jincheng Mei, Fei Xu, Wenting Tu, Ziyu Lu
Pages: 801-804
doi>10.1145/2911451.2914723
Full text: PDFPDF

Discovering the author's interest over time from documents has important applications in recommendation systems, authorship identification and opinion extraction. In this paper, we propose an interest drift model (IDM), which monitors the evolution of ...
expand
Distributional Random Oversampling for Imbalanced Text Classification
Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani
Pages: 805-808
doi>10.1145/2911451.2914722
Full text: PDFPDF

The accuracy of many classification algorithms is known to suffer when the data are imbalanced (i.e., when the distribution of the examples across the classes is severely skewed). Many applications of binary text classification are of this type, with ...
expand
Doc2Sent2Vec: A Novel Two-Phase Approach for Learning Document Representation
Ganesh J, Manish Gupta, Vasudeva Varma
Pages: 809-812
doi>10.1145/2911451.2914717
Full text: PDFPDF

Doc2Sent2Vec is an unsupervised approach to learn low-dimensional feature vector (or embedding) for a document. This embedding captures the semantics of the document and can be fed as input to machine learning algorithms to solve a myriad number of applications ...
expand
Dynamically Integrating Item Exposure with Rating Prediction in Collaborative Filtering
Ting-Yi Shih, Ting-Chang Hou, Jian-De Jiang, Yen-Chieh Lien, Chia-Rui Lin, Pu-Jen Cheng
Pages: 813-816
doi>10.1145/2911451.2914769
Full text: PDFPDF

The paper proposes a novel approach to appropriately promote those items with few ratings in collaborative filtering. Different from previous works, we force the items with few ratings to be promoted to the users who would potentially be able to give ...
expand
Effective Trend Detection within a Dynamic Search Context
Anat Hashavit, Roy Levin, Ido Guy, Gilad Kutiel
Pages: 817-820
doi>10.1145/2911451.2914705
Full text: PDFPDF

In recent years, studies about trend detection in online social media streams have begun to emerge. Since not all users are likely to always be interested in the same set of trends, some of the research also focused on personalizing the trends by using ...
expand
Enhancing First Story Detection using Word Embeddings
Sean Moran, Richard McCreadie, Craig Macdonald, Iadh Ounis
Pages: 821-824
doi>10.1145/2911451.2914719
Full text: PDFPDF

In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets ...
expand
Examining the Coherence of the Top Ranked Tweet Topics
Anjie Fang, Craig Macdonald, Iadh Ounis, Philip Habel
Pages: 825-828
doi>10.1145/2911451.2914731
Full text: PDFPDF

Topic modelling approaches help scholars to examine the topics discussed in a corpus. Due to the popularity of Twitter, two distinct methods have been proposed to accommodate the brevity of tweets: the tweet pooling method and Twitter LDA. Both of these ...
expand
Explicit In Situ User Feedback for Web Search Results
Jin Young Kim, Jaime Teevan, Nick Craswell
Pages: 829-832
doi>10.1145/2911451.2914754
Full text: PDFPDF

Gathering evidence about whether a search result is relevant is a core concern in the evaluation and improvement of information retrieval systems. Two common sources of evidence for establishing relevance are judgements from trained assessors and logs ...
expand
Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, Rossano Venturini
Pages: 833-836
doi>10.1145/2911451.2914758
Full text: PDFPDF

Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This paper ...
expand
Exploiting Semantic Coherence Features for Information Retrieval
Xinhui Tu, Jimmy Xiangji Huang, Jing Luo, Tingting He
Pages: 837-840
doi>10.1145/2911451.2914691
Full text: PDFPDF

Most of the existing information retrieval models assume that the terms of a text document are independent of each other. These retrieval models integrate three major variables to determine the degree of importance of a term for a document: within document ...
expand
Extracting Information Seeking Intentions for Web Search Sessions
Matthew Mitsui, Chirag Shah, Nicholas J. Belkin
Pages: 841-844
doi>10.1145/2911451.2914746
Full text: PDFPDF

We present a method for extracting the self-reported intentions of users engaged in an information seeking episode. We recruited participants to conduct search sessions and subsequently asked them to self-report their intentions. A total of 27 users ...
expand
First Story Detection using Multiple Nearest Neighbors
Jeroen B.P. Vuurens, Arjen P. de Vries
Pages: 845-848
doi>10.1145/2911451.2914761
Full text: PDFPDF

First Story Detection (FSD) systems aim to identify those news articles that discuss an event that was not reported before. Recent work on FSD has focussed almost exclusively on efficiently detecting documents that are dissimilar from their nearest neighbor. ...
expand
Health Monitoring on Social Media over Time
Sumit Sidana, Shashwat Mishra, Sihem Amer-Yahia, Marianne Clausel, Massih-Reza Amini
Pages: 849-852
doi>10.1145/2911451.2914697
Full text: PDFPDF

Social media has become a major source for analyzing all aspects of daily life. Thanks to dedicated latent topic analysis methods such as the Ailment Topic Aspect Model (ATAM), public health can now be observed on Twitter. In this work, we are interested ...
expand
How Informative is a Term?: Dispersion as a measure of Term Specificity
Rodney McDonell, Justin Zobel, Bodo Billerbeck
Pages: 853-856
doi>10.1145/2911451.2914687
Full text: PDFPDF

Similarity functions assign scores to documents in response to queries. These functions require as input statistics about the terms in the queries and documents, where the intention is that the statistics are estimates of the relative informativeness ...
expand
Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach
Yashar Moshfeghi, Alvaro F. Huertas-Rosero, Joemon M. Jose
Pages: 857-860
doi>10.1145/2911451.2914756
Full text: PDFPDF

In this paper we introduce a game scenario for crowdsourcing (CS) using incentives as a bait for careless (gambler) workers, who respond to them in a characteristic way. We hypothesise that careless workers are risk-inclined and can be detected in the ...
expand
Impact of Review-Set Selection on Human Assessment for Text Classification
Adam Roegiest, Gordon V. Cormack
Pages: 861-864
doi>10.1145/2911451.2914709
Full text: PDFPDF

In a laboratory study, human assessors were significantly more likely to judge the same documents as relevant when they were presented for assessment within the context of documents selected using random or uncertainty sampling, as compared to relevance ...
expand
Improving Automated Controversy Detection on the Web
Myungha Jang, James Allan
Pages: 865-868
doi>10.1145/2911451.2914764
Full text: PDFPDF

Automatically detecting controversy on the Web is a useful capability for a search engine to help users review web content with a more balanced and critical view. The current state-of-the art approach is to find K-Nearest-Neighbors in Wikipedia to the ...
expand
Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval
Qingyao Ai, Liu Yang, Jiafeng Guo, W. Bruce Croft
Pages: 869-872
doi>10.1145/2911451.2914688
Full text: PDFPDF

Incorporating topic level estimation into language models has been shown to be beneficial for information retrieval (IR) models such as cluster-based retrieval and LDA-based document representation. Neural embedding models, such as paragraph vector (PV) ...
expand
Improving Retrieval Quality Using Pseudo Relevance Feedback in Content-Based Image Retrieval
Dinesha Chathurani Nanayakkara Wasam Uluwitige, Timothy Chappell, Shlomo Geva, Vinod Chandran
Pages: 873-876
doi>10.1145/2911451.2914747
Full text: PDFPDF

The increased availability of image capturing devices has enabled collections of digital images to rapidly expand in both size and diversity. This has created a constantly growing need for efficient and effective image browsing, searching, and retrieval ...
expand
Ingrams: A Neuropsychological Explanation For Why People Search
Peter Bailey, Nick Craswell
Pages: 877-880
doi>10.1145/2911451.2914712
Full text: PDFPDF

Why do people start a search? Why do they stop? Why do they do what they do in-between? Our goal in this paper is to provide a simple yet general explanation for these acts that has its basis in neuropsychology and observed user behavior. We coin the ...
expand
Investment Recommendation using Investor Opinions in Social Media
Wenting Tu, David W. Cheung, Nikos Mamoulis, Min Yang, Ziyu Lu
Pages: 881-884
doi>10.1145/2911451.2914699
Full text: PDFPDF

Investor social media, such as StockTwist, are gaining increasing popularity. These sites allow users to post their investing opinions and suggestions in the form of microblogs. Given the growth of the posted data, a significant and challenging research ...
expand
"Is Sven Seven?": A Search Intent Module for Children
Nevena Dragovic, Ion Madrazo Azpiazu, Maria Soledad Pera
Pages: 885-888
doi>10.1145/2911451.2914738
Full text: PDFPDF

The Internet is the biggest data-sharing platform, comprised of an immeasurable quantity of resources covering diverse topics appealing to users of all ages. Children shape tomorrow's society, so it is essential that this audience becomes agile with ...
expand
Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search
Kyle Williams, Julia Kiseleva, Aidan C. Crook, Imed Zitouni, Ahmed Hassan Awadallah, Madian Khabsa
Pages: 889-892
doi>10.1145/2911451.2914736
Full text: PDFPDF

Answers on mobile search result pages have become a common way to attempt to satisfy users without them needing to click on search results. Many different types of answers exist, such as weather, flight and currency answers. Understanding the effect ...
expand
Jointly Modeling Review Content and Aspect Ratings for Review Rating Prediction
Zhipeng Jin, Qiudan Li, Daniel D. Zeng, YongCheng Zhan, Ruoran Liu, Lei Wang, Hongyuan Ma
Pages: 893-896
doi>10.1145/2911451.2914692
Full text: PDFPDF

Review rating prediction is of much importance for sentiment analysis and business intelligence. Existing methods work well when aspect-opinion pairs can be accurately extracted from review texts and aspect ratings are complete. The challenges of improving ...
expand
Learning to Project and Binarise for Hashing Based Approximate Nearest Neighbour Search
Sean Moran
Pages: 897-900
doi>10.1145/2911451.2914766
Full text: PDFPDF

In this paper we focus on improving the effectiveness of hashing-based approximate nearest neighbour search. Generating similarity preserving hashcodes for images has been shown to be an effective and efficient method for searching through large datasets. ...
expand
Linking Organizational Social Network Profiles
Jerome Cheng, Kazunari Sugiyama, Min-Yen Kan
Pages: 901-904
doi>10.1145/2911451.2914698
Full text: PDFPDF

Many organizations possess social media accounts on different social networks, but these profiles are not always linked. End applications, users, as well as the organization themselves, can benefit when the profiles are appropriately identified and linked. ...
expand
Load-Balancing in Distributed Selective Search
Yubin Kim, Jamie Callan, J. Shane Culpepper, Alistair Moffat
Pages: 905-908
doi>10.1145/2911451.2914689
Full text: PDFPDF

Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of ...
expand
Multi-Rate Deep Learning for Temporal Recommendation
Yang Song, Ali Mamdouh Elkahky, Xiaodong He
Pages: 909-912
doi>10.1145/2911451.2914726
Full text: PDFPDF

Modeling temporal behavior in recommendation systems is an important and challenging problem. Its challenges come from the fact that temporal modeling increases the cost of parameter estimation and inference, while requiring large amount of data to reliably ...
expand
Network-Aware Recommendations of Novel Tweets
Noor Aldeen Alawad, Aris Anagnostopoulos, Stefano Leonardi, Ida Mele, Fabrizio Silvestri
Pages: 913-916
doi>10.1145/2911451.2914760
Full text: PDFPDF

With the rapid proliferation of microblogging services such as Twitter, a large number of tweets is published everyday often making users feel overwhelmed with information. Helping these users to discover potentially interesting tweets is an important ...
expand
Not All Links Are Created Equal: An Adaptive Embedding Approach for Social Personalized Ranking
Qing Zhang, Houfeng Wang
Pages: 917-920
doi>10.1145/2911451.2914740
Full text: PDFPDF

With a large amount of complex network data available, most existing recommendation models consider exploiting rich user social relations for better interest targeting. In these approaches, the underlying assumption is that similar users in social networks ...
expand
On a Topic Model for Sentences
Georgios Balikas, Massih-Reza Amini, Marianne Clausel
Pages: 921-924
doi>10.1145/2911451.2914714
Full text: PDFPDF

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such ...
expand
On Information-Theoretic Document-Person Associations for Expert Search in Academia
Vitor Mangaravite, Rodrygo L.T. Santos
Pages: 925-928
doi>10.1145/2911451.2914751
Full text: PDFPDF

State-of-the-art expert search approaches rely on document-person associations to infer the expertise of a candidate person for a given query. Such associations have traditionally been modeled as boolean variables, indicating whether or not a candidate ...
expand
On the Applicability of Delicious for Temporal Search on Web Archives
Helge Holzmann, Wolfgang Nejdl, Avishek Anand
Pages: 929-932
doi>10.1145/2911451.2914724
Full text: PDFPDF

Web archives are large longitudinal collections that store webpages from the past, which might be missing on the current live Web. Consequently, temporal search over such collections is essential for finding prominent missing webpages and tasks like ...
expand
On the Effectiveness of Contextualisation Techniques in Spoken Query Spoken Content Retrieval
David N. Racca, Gareth J.F. Jones
Pages: 933-936
doi>10.1145/2911451.2914730
Full text: PDFPDF

In passage and XML retrieval, contextualisation techniques seek to improve the rank of a relevant element by considering information from its surrounding elements and its container document. Recent research has demonstrated that some of these techniques ...
expand
Ordinal Text Quantification
Giovanni Da San Martino, Wei Gao, Fabrizio Sebastiani
Pages: 937-940
doi>10.1145/2911451.2914749
Full text: PDFPDF

In recent years there has been a growing interest in text quantification, a supervised learning task where the goal is to accurately estimate, in an unlabelled set of items, the prevalence (or "relative frequency") of each class c in a predefined ...
expand
Pearson Rank: A Head-Weighted Gap-Sensitive Score-Based Correlation Coefficient
Ning Gao, Mossaab Bagdouri, Douglas W. Oard
Pages: 941-944
doi>10.1145/2911451.2914728
Full text: PDFPDF

One way of evaluating the reusability of a test collection is to determine whether removing the unique contributions of some system would alter the preference order between that system and others. Rank correlation measures such as Kendall's tau are often ...
expand
Polarized User and Topic Tracking in Twitter
Mauro Coletto, Claudio Lucchese, Salvatore Orlando, Raffaele Perego
Pages: 945-948
doi>10.1145/2911451.2914716
Full text: PDFPDF

Digital traces of conversations in micro-blogging platforms and OSNs provide information about user opinion with a high degree of resolution. These information sources can be exploited to understand and monitor collective behaviours. In this work, we ...
expand
Post-Learning Optimization of Tree Ensembles for Efficient Ranking
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, Salvatore Trani
Pages: 949-952
doi>10.1145/2911451.2914763
Full text: PDFPDF

Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, efficiency and effectiveness are intertwined concepts and trading off effectiveness ...
expand
Quit While Ahead: Evaluating Truncated Rankings
Fei Liu, Alistair Moffat, Timothy Baldwin, Xiuzhen Zhang
Pages: 953-956
doi>10.1145/2911451.2914737
Full text: PDFPDF

Many types of search tasks are answered through the computation of a ranked list of suggested answers. We re-examine the usual assumption that answer lists should be as long as possible, and suggest that when the number of matching items is potentially ...
expand
Quote Recommendation in Dialogue using Deep Neural Network
Hanbit Lee, Yeonchan Ahn, Haejun Lee, Seungdo Ha, Sang-goo Lee
Pages: 957-960
doi>10.1145/2911451.2914734
Full text: PDFPDF

Quotes, or quotations, are well known phrases or sentences that we use for various purposes such as emphasis, elaboration, and humor. In this paper, we introduce a task of recommending quotes which are suitable for given dialogue context and we present ...
expand
Ranking Documents Through Stochastic Sampling on Bayesian Network-based Models: A Pilot Study
Xing Tan, Jimmy Xiangji Huang, Aijun An
Pages: 961-964
doi>10.1145/2911451.2914750
Full text: PDFPDF

Using approximate inference techniques, we investigate in this paper the applicability of Bayesian Networks to the problem of ranking a large set of documents. Topology of the network is a bipartite. Network parameters (conditional probability distributions) ...
expand
Ranking Health Web Pages with Relevance and Understandability
Joao Palotti, Lorraine Goeuriot, Guido Zuccon, Allan Hanbury
Pages: 965-968
doi>10.1145/2911451.2914741
Full text: PDFPDF

We propose a method that integrates relevance and understandability to rank health web documents. We use a learning to rank approach with standard retrieval features to determine topical relevance and additional features based on readability measures ...
expand
Rethinking the Cost of Information Search Behavior
Yinglong Zhang, Jacek Gwizdka
Pages: 969-972
doi>10.1145/2911451.2914742
Full text: PDFPDF

In this paper, we present a cognitive-economic approach to examining the cost in information search. Unlike previous studies on economic models, we calculated the cost in information search based on participants' eye-tracking data as well as their behavioral ...
expand
Retrievability of Code Mixed Microblogs
Debasis Ganguly, Ayan Bandyopadhyay, Mandar Mitra, Gareth J.F. Jones
Pages: 973-976
doi>10.1145/2911451.2914727
Full text: PDFPDF

Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval ...
expand
Retweeting Behavior Prediction Based on One-Class Collaborative Filtering in Social Networks
Bo Jiang, Jiguang Liang, Ying Sha, Rui Li, Wei Liu, Hongyuan Ma, Lihong Wang
Pages: 977-980
doi>10.1145/2911451.2914713
Full text: PDFPDF

Social behaviors such as retweetings, comments or likes are valuable information for human activities analysis. We focus here on user's retweeting behavior which has been considered as a key mechanism of information diffusion in social networks. Since ...
expand
Sampling Strategies and Active Learning for Volume Estimation
Haotian Zhang, Jimmy Lin, Gordon V. Cormack, Mark D. Smucker
Pages: 981-984
doi>10.1145/2911451.2914685
Full text: PDFPDF

This paper tackles the challenge of accurately and efficiently estimating the number of relevant documents in a collection for a particular topic. One real-world application is estimating the volume of social media posts (e.g., tweets) pertaining to ...
expand
Search-based Evaluation from Truth Transcripts for Voice Search Applications
François Mairesse, Paul Raccuglia, Shiv Vitaladevuni
Pages: 985-988
doi>10.1145/2911451.2914735
Full text: PDFPDF

Voice search applications are typically evaluated by comparing the predicted query to a reference human transcript, regardless of the search results returned by the query. While we find that an exact transcript match is highly indicative of user satisfaction, ...
expand
Seeking Serendipity: A Living Lab Approach to Understanding Creative Retrieval in Broadcast Media Production
Sabrina Sauer, Maarten de Rijke
Pages: 989-992
doi>10.1145/2911451.2914721
Full text: PDFPDF

This paper presents a method to map user needs and integrate serendipitous search behaviors in search algorithm development: the living lab approach. This user-centered design approach involves technology users during technology development to catch ...
expand
Selectively Personalizing Query Auto-Completion
Fei Cai, Maarten de Rijke
Pages: 993-996
doi>10.1145/2911451.2914686
Full text: PDFPDF

Query auto-completion (QAC) is being used by many of today's search engines. It helps searchers formulate queries by providing a list of query completions after entering an initial prefix of a query. To cater for a user's specific information needs, ...
expand
SG++: Word Representation with Sentiment and Negation for Twitter Sentiment Classification
Qinmin Hu, Yijun Pei, Qin Chen, Liang He
Pages: 997-1000
doi>10.1145/2911451.2914718
Full text: PDFPDF

Here we propose an advance Skip-gram model to incorporate both word sentiment and negation information. In particular, there is a a softmax layer for the word sentiment polarity upon the Skip-gram model. Then, two paralleled embedding layers are set ...
expand
SGT Framework: Social, Geographical and Temporal Relevance for Recreational Queries in Web Search
Stewart Whiting, Omar Alonso
Pages: 1001-1004
doi>10.1145/2911451.2914743
Full text: PDFPDF

While location-based social networks (LBSNs) have become widely used for sharing and consuming location information, a large number of users turn to general web search engines for recreational activity ideas. In these cases, users typically express a ...
expand
SimCC-AT: A Method to Compute Similarity of Scientific Papers with Automatic Parameter Tuning
Masoud Reyhani Hamedani, Sang-Wook Kim
Pages: 1005-1008
doi>10.1145/2911451.2914715
Full text: PDFPDF

In this paper, we propose SimCC-AT (similarity based on content and citations with automatic parameter tuning) to compute the similarity of scientific papers. As in SimCC, the state-of-the-art method, we exploit a notion of a contribution score in similarity ...
expand
Simple Dynamic Emission Strategies for Microblog Filtering
Luchen Tan, Adam Roegiest, Charles L.A. Clarke, Jimmy Lin
Pages: 1009-1012
doi>10.1145/2911451.2914704
Full text: PDFPDF

Push notifications from social media provide a method to keep up-to-date on topics of personal interest. To be effective, notifications must achieve a balance between pushing too much and pushing too little. Push too little and the user misses important ...
expand
Subspace Clustering Based Tag Sharing for Inductive Tag Matrix Refinement with Complex Errors
Yuqing Hou, Zhouchen Lin, Jin-ge Yao
Pages: 1013-1016
doi>10.1145/2911451.2914693
Full text: PDFPDF

Annotating images with tags is useful for indexing and retrieving images. However, many available annotation data include missing or inaccurate annotations. In this paper, we propose an image annotation framework which sequentially performs tag completion ...
expand
Temporal Query Intent Disambiguation using Time-Series Data
Yue Zhao, Claudia Hauff
Pages: 1017-1020
doi>10.1145/2911451.2914767
Full text: PDFPDF

Understanding temporal intents behind users' queries is essential to meet users' time-related information needs. In order to classify queries according to their temporal intent (e.g. Past or Future), we explore the usage of time-series data derived from ...
expand
To Blend or Not to Blend?: Perceptual Speed, Visual Memory and Aggregated Search
Lauren Turpin, Diane Kelly, Jaime Arguello
Pages: 1021-1024
doi>10.1145/2911451.2914739
Full text: PDFPDF

While aggregated search interfaces that present vertical results to searchers are fairly common in today's search environments, little is known about how searchers' cognitive abilities impact how they use and evaluate these interfaces. This study evaluates ...
expand
Topic Model based Privacy Protection in Personalized Web Search
Wasi Uddin Ahmad, Md Masudur Rahman, Hongning Wang
Pages: 1025-1028
doi>10.1145/2911451.2914753
Full text: PDFPDF

Modern search engines utilize users' search history for personalization, which provides more effective, useful and relevant search results. However, it also has the potential risk of revealing users' privacy by identifying their underlying intention ...
expand
Topic Quality Metrics Based on Distributed Word Representations
Sergey I. Nikolenko
Pages: 1029-1032
doi>10.1145/2911451.2914720
Full text: PDFPDF

Automated evaluation of topic quality remains an important unsolved problem in topic modeling and represents a major obstacle for development and evaluation of new topic models. Previous attempts at the problem have been formulated as variations on the ...
expand
Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance
Julián Urbano, Mónica Marrero
Pages: 1033-1036
doi>10.1145/2911451.2914752
Full text: PDFPDF

The Kendall ? and AP rank correlation coefficients have become mainstream in Information Retrieval research for comparing the rankings of systems produced by two different evaluation conditions, such as different effectiveness measures or pool depths. ...
expand
Tracking Sentiment by Time Series Analysis
Anastasia Giachanou, Fabio Crestani
Pages: 1037-1040
doi>10.1145/2911451.2914702
Full text: PDFPDF

In recent years social media have emerged as popular platforms for people to share their thoughts and opinions on all kind of topics. Tracking opinion over time is a powerful tool that can be used for sentiment prediction or to detect the possible reasons ...
expand
Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
Soroush Vosoughi, Prashanth Vijayaraghavan, Deb Roy
Pages: 1041-1044
doi>10.1145/2911451.2914762
Full text: PDFPDF

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language ...
expand
Two Sample T-tests for IR Evaluation: Student or Welch?
Tetsuya Sakai
Pages: 1045-1048
doi>10.1145/2911451.2914684
Full text: PDFPDF

There are two well-known versions of the t-test for comparing means from unpaired data: Student's t-test and Welch's t-test. While Welch's t-test does not assume homoscedasticity (i.e., equal variances), nit involves approximations. ...
expand
Uncovering Task Based Behavioral Heterogeneities in Online Search Behavior
Rishabh Mehrotra, Prasanta Bhattacharya, Emine Yilmaz
Pages: 1049-1052
doi>10.1145/2911451.2914755
Full text: PDFPDF

While a major share of prior work have considered search sessions as the focal unit of analysis for seeking behavioral insights, search tasks are emerging as a competing perspective in this space. In the current work, we quantify user search task behavior ...
expand
Understanding Website Behavior based on User Agent
Kien Pham, Aécio Santos, Juliana Freire
Pages: 1053-1056
doi>10.1145/2911451.2914757
Full text: PDFPDF

Web sites have adopted a variety of adversarial techniques to prevent web crawlers from retrieving their content. While it is possible to simulate users behavior using a browser to crawl such sites, this approach is not scalable. Therefore, understanding ...
expand
Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data
Anjie Fang, Craig Macdonald, Iadh Ounis, Philip Habel
Pages: 1057-1060
doi>10.1145/2911451.2914729
Full text: PDFPDF

Scholars often seek to understand topics discussed on Twitter using topic modelling approaches. Several coherence metrics have been proposed for evaluating the coherence of the topics generated by these approaches, including the pre-calculated Pointwise ...
expand
Utilizing Focused Relevance Feedback
Elinor Brondwine, Anna Shtok, Oren Kurland
Pages: 1061-1064
doi>10.1145/2911451.2914695
Full text: PDFPDF

We present a novel study of ad hoc retrieval methods utilizing document-level relevance feedback and/or focused relevance feedback; namely, passages marked as (non-)relevant. The first method uses a novel mixture model that integrates relevant ...
expand
What Makes a Query Temporally Sensitive?
Craig Willis, Garrick Sherman, Miles Efron
Pages: 1065-1068
doi>10.1145/2911451.2914703
Full text: PDFPDF

This work takes an in-depth look at the factors that affect manual classifications of 'temporally sensitive' information needs. We use qualitative and quantitative techniques to analyze 660 topics from the Text Retrieval Conference (TREC) previously ...
expand
Which Information Sources are More Effective and Reliable in Video Search
Zhiyong Cheng, Xuanchong Li, Jialie Shen, Alexander G. Hauptmann
Pages: 1069-1072
doi>10.1145/2911451.2914765
Full text: PDFPDF

It is common that users are interested in finding video segments, which contain further information about the video contents in a segment of interest. To facilitate users to find and browse related video contents, video hyperlinking aims at constructing ...
expand
Why do you Think this Query is Difficult?: A User Study on Human Query Prediction
Stefano Mizzaro, Josiane Mothe
Pages: 1073-1076
doi>10.1145/2911451.2914696
Full text: PDFPDF

Predicting if a query will be difficult for a system is important to improve retrieval effectiveness by implementing specific processing. There have been several attempts to predict difficulty, both automatically and manually; but without high accuracy ...
expand
DEMONSTRATION SESSION: Demonstrations
Craig Macdonald
A Platform for Streaming Push Notifications to Mobile Assessors
Adam Roegiest, Luchen Tan, Jimmy Lin, Charles L.A. Clarke
Pages: 1077-1080
doi>10.1145/2911451.2911463
Full text: PDFPDF

We present an assessment platform for gathering online relevance judgments for mobile push notifications that will be deployed in the newly-created TREC 2016 Real-Time Summarization (RTS) track. There is emerging interest in building systems that filter ...
expand
A Visual Analytics Approach for What-If Analysis of Information Retrieval Systems
Marco Angelini, Nicola Ferro, Giuseppe Santucci, Gianmaria Silvello
Pages: 1081-1084
doi>10.1145/2911451.2911462
Full text: PDFPDF

We present the innovative visual analytics approach of the VATE system, which eases and makes more effective the experimental evaluation process by introducing the what-if analysis. The what-if analysis is aimed at estimating the possible effects of ...
expand
An Architecture for Privacy-Preserving and Replicable High-Recall Retrieval Experiments
Adam Roegiest, Gordon V. Cormack
Pages: 1085-1088
doi>10.1145/2911451.2911456
Full text: PDFPDF

We demonstrate the infrastructure used in the TREC 2015 Total Recall track to facilitate controlled simulation of "assessor in the loop" high-recall retrieval experimentation. The implementation and corresponding design decisions are presented for this ...
expand
Analysing Temporal Evolution of Interlingual Wikipedia Article Pairs
Simon Gottschalk, Elena Demidova
Pages: 1089-1092
doi>10.1145/2911451.2911472
Full text: PDFPDF

Wikipedia articles representing an entity or a topic in different language editions evolve independently within the scope of the language-specific user communities. This can lead to different points of views reflected in the articles, as well as complementary ...
expand
Cobwebs from the Past and Present: Extracting Large Social Networks using Internet Archive Data
Miroslav Shaltev, Jan-Hendrik Zab, Philipp Kemkes, Stefan Siersdorfer, Sergej Zerr
Pages: 1093-1096
doi>10.1145/2911451.2911467
Full text: PDFPDF

Social graph construction from various sources has been of interest to researchers due to its application potential and the broad range of technical challenges involved. The World Wide Web provides a huge amount of continuously updated data and information ...
expand
Context-Sensitive Auto-Completion for Searching with Entities and Categories
Andreas Schmidt, Johannes Hoffart, Dragan Milchevski, Gerhard Weikum
Pages: 1097-1100
doi>10.1145/2911451.2911461
Full text: PDFPDF

When searching in a document collection by keywords, good auto-completion suggestions can be derived from query logs and corpus statistics. On the other hand, when querying documents which have automatically been linked to entities and semantic categories, ...
expand
EAIMS: Emergency Analysis Identification and Management System
Richard McCreadie, Craig Macdonald, Iadh Ounis
Pages: 1101-1104
doi>10.1145/2911451.2911460
Full text: PDFPDF

Social media has great potential as a means to enable civil protection and law enforcement agencies to more effectively tackle disasters and emergencies. However, there is currently a lack of tools that enable civil protection agencies to easily make ...
expand
Expedition: A Time-Aware Exploratory Search System Designed for Scholars
Jaspreet Singh, Wolfgang Nejdl, Avishek Anand
Pages: 1105-1108
doi>10.1145/2911451.2911465
Full text: PDFPDF

Archives are an important source of study for various scholars. Digitization and the web have made archives more accessible and led to the development of several time-aware exploratory search systems. However these systems have been designed for more ...
expand
iGlasses: A Novel Recommendation System for Best-fit Glasses
Xiaoling Gu, Lidan Shou, Pai Peng, Ke Chen, Sai Wu, Gang Chen
Pages: 1109-1112
doi>10.1145/2911451.2911453
Full text: PDFPDF

We demonstrate iGlasses, a novel recommendation system that accepts a frontal face photo as the input and returns the best-fit eyeglasses as the output. As conventional recommendation techniques such as collaborative filtering become inapplicable ...
expand
InfoScout: An Interactive, Entity Centric, Person Search Tool
Sean McKeown, Martynas Buivys, Leif Azzopardi
Pages: 1113-1116
doi>10.1145/2911451.2911468
Full text: PDFPDF

Individuals living in highly networked societies publish a large amount of personal, and potentially sensitive, information online. Web investigators can exploit such information for a variety of purposes, such as in background vetting and fraud detection. ...
expand
InLook: Revisiting Email Search Experience
Pranav Ramarao, Suresh Iyengar, Pushkar Chitnis, Raghavendra Udupa, Balasubramanyan Ashok
Pages: 1117-1120
doi>10.1145/2911451.2911458
Full text: PDFPDF

Emails continue to remain the most important and widely used mode of online communication despite having its origins in the middle of last century and being threatened by a variety of online communication innovations. While several studies have predicted ...
expand
Interacting with Financial Data using Natural Language
Vassilis Plachouras, Charese Smiley, Hiroko Bretz, Ola Taylor, Jochen L. Leidner, Dezhao Song, Frank Schilder
Pages: 1121-1124
doi>10.1145/2911451.2911457
Full text: PDFPDF

Financial and economic data are typically available in the form of tables and comprise mostly of monetary amounts, numeric and other domain-specific fields. They can be very hard to search and they are often made available out of context, or in forms ...
expand
LONLIES: Estimating Property Values for Long Tail Entities
Mina Farid, Ihab F. Ilyas, Steven Euijong Whang, Cong Yu
Pages: 1125-1128
doi>10.1145/2911451.2911466
Full text: PDFPDF

Web search engines often retrieve answers for queries about popular entities from a growing knowledge base that is populated by a continuous information extraction process. However, less popular entities are not frequently mentioned on the web and are ...
expand
Personalised News and Blog Recommendations based on User Location, Facebook and Twitter User Profiling
Gabriella Kazai, Iskander Yusof, Daoud Clarke
Pages: 1129-1132
doi>10.1145/2911451.2911464
Full text: PDFPDF

This demo presents a prototype mobile app that provides out-of-the-box personalised content recommendations to its users by leveraging and combining the user's location, their Facebook and/or Twitter feed and their in-app actions to automatically infer ...
expand
PULP: A System for Exploratory Search of Scientific Literature
Alan Medlar, Kalle Ilves, Ping Wang, Wray Buntine, Dorota Glowacka
Pages: 1133-1136
doi>10.1145/2911451.2911455
Full text: PDFPDF

Despite the growing importance of exploratory search, information retrieval (IR) systems tend to focus on lookup search. Lookup searches are well served by optimising the precision and recall of search results, however, for exploratory search this may ...
expand
SECC: A Novel Search Engine Interface with Live Chat Channel
Cheng Zhang, Peng Zhang, Jingfei Li, Dawei Song
Pages: 1137-1140
doi>10.1145/2911451.2911454
Full text: PDFPDF

Traditional information retrieval systems rank documents according to their relevance to users' input queries. State of the art commercial search engines (SEs) train ranking models and suggest query refinements by exploiting collective intelligence implicitly ...
expand
Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction
David Maxwell, Leif Azzopardi
Pages: 1141-1144
doi>10.1145/2911451.2911469
Full text: PDFPDF

Simulation provides a powerful and cost-effective approach to explore and evaluate how interactions between a searcher and system influence search behaviour and performance. With a growing interest in simulation and an increasing number of papers using ...
expand
The ComeWithMe System for Searching and Ranking Activity-Based Carpooling Rides
Vinicius Monteiro de Lira, Chiara Renso, Raffaele Perego, Salvatore Rinzivillo, Valeria Cesario Times
Pages: 1145-1148
doi>10.1145/2911451.2911459
Full text: PDFPDF

ComeWithMe is an activity oriented carpooling service that enlarges the candidate destinations of a ride request by considering alternative places where the desired activity can be performed. It is based on the observation that individuals often move ...
expand
ThingSeek: A Crawler and Search Engine for the Internet of Things
Ali Shemshadi, Quan Z. Sheng, Yongrui Qin
Pages: 1149-1152
doi>10.1145/2911451.2911471
Full text: PDFPDF

The rapidly growing paradigm of the Internet of Things (IoT) requires new search engines, which can crawl heterogeneous data sources and search in highly dynamic contexts. Existing search engines cannot meet these requirements as they are designed for ...
expand
Tweetviz: Visualizing Tweets for Business Intelligence
Bas Sijtsma, Pernilla Qvarfordt, Francine Chen
Pages: 1153-1156
doi>10.1145/2911451.2911470
Full text: PDFPDF

Social media offers potential opportunities for businesses to extract business intelligence. This paper presents Tweetviz, an interactive tool to help businesses extract actionable information from a large set of noisy Twitter messages. Tweetviz visualizes ...
expand
Where the Event Lies: Predicting Event Occurrence in Textual Documents
Andrea Ceroni, Ujwal Gadiraju, Jan Matschke, Simon Wingert, Marco Fisichella
Pages: 1157-1160
doi>10.1145/2911451.2911452
Full text: PDFPDF

Manually inspecting text in a document collection to assess whether an event occurs in it is a cumbersome task. Although a manual inspection can allow one to identify and discard false events, it becomes infeasible with increasing numbers of automatically ...
expand
SESSION: Doctoral Consortium
A Novel Approach to Define and Model Contextual Features in Recommender Systems
Parisa Lak
Pages: 1161-1161
doi>10.1145/2911451.2911481
Full text: PDFPDF

Recommender Systems(RS) provide more accurate and more relevant recommendations using contextual feature(s). This accuracy improvement is at the cost of computational expenses. Therefore, finding and selecting the most relevant contextual features is ...
expand
A Study of Information Seeking Behavior Using Physical and Online Explorations
Dongho Choi
Pages: 1163-1163
doi>10.1145/2911451.2911482
Full text: PDFPDF

People have their behavioral patterns, through which they determine how to seek and use information. People also exhibit established mobility pattern in their everyday lives. Meanwhile, the modern technologies such as smartphones, wearable devices, and ...
expand
Appearance-Based Retrieval of Mathematical Notation in Documents and Lecture Videos
Kenny Davila
Pages: 1165-1165
doi>10.1145/2911451.2911477
Full text: PDFPDF

Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. Based on the notion that visually similar formulas are related, we propose a framework ...
expand
Beyond Topical Relevance: Studying Understandability and Reliability in Consumer Health Search
Joao Palotti
Pages: 1167-1167
doi>10.1145/2911451.2911480
Full text: PDFPDF

Nowadays people rely on search engines to explore, understand and manage their health. A recent study from Pew Internet states that one in each three adult American Internet users have used the Internet as a diagnosis tool. Retrieving incorrect or unclear ...
expand
Enhancing Information Retrieval with Adapted Word Embedding
Navid Rekabsaz
Pages: 1169-1169
doi>10.1145/2911451.2911475
Full text: PDFPDF

Recent developments on word embedding provide a novel source of information for term-to-term similarity. A recurring question now is whether the provided term associations can be properly integrated in the traditional information retrieval models while ...
expand
Fairness in Information Retrieval
Aldo Lipani
Pages: 1171-1171
doi>10.1145/2911451.2911473
Full text: PDFPDF

The offline evaluation of Information Retrieval (IR) systems is performed through the use of test collections. A test collection, in its essence, is composed of: a collection of documents, a set of topics and, a set of relevance assessments for each ...
expand
Going Beyond Relevance: Incorporating Effort in Information Retrieval
Manisha Verma
Pages: 1173-1173
doi>10.1145/2911451.2911487
Full text: PDFPDF

Primary focus of Information retrieval (IR) systems has been to optimizefor Relevance. Existing approaches used to rank documents or evaluate IR systems do not account for "user effort". At present, relevance captures topical overlap between document ...
expand
Measuring Interestingness of Political Documents
Hosein Azarbonyad
Pages: 1175-1175
doi>10.1145/2911451.2911485
Full text: PDFPDF

Political texts are pervasive on the Web covering laws and policies in national and supranational jurisdictions. Access to this data is crucial for government transparency and accountability to the population. The main aim of our research is developing ...
expand
Modeling User Feedback in Dynamic Search and Browsing
Jiyun Luo
Pages: 1177-1177
doi>10.1145/2911451.2911483
Full text: PDFPDF

Nowadays searching for complicated information needs becomes more and more common. These complicated needs usually require the users to reform different queries and conduct multiple retrievals in a search session. There are a lot of technologies are ...
expand
Modelling User Search Behaviour Based on Process
Mengdie Zhuang
Pages: 1179-1179
doi>10.1145/2911451.2911486
Full text: PDFPDF

Typically, interactive information retrieval (IIR) system evaluations assess search processes and outcomes using a combination of two types of measures: 1. user perception (e.g. users? attitudes of the search experience and outcome); 2. user behaviour ...
expand
Retrievability: An Independent Evaluation Measure
Colin Wilkie
Pages: 1181-1181
doi>10.1145/2911451.2911478
Full text: PDFPDF

Information Retrieval systems have traditionally been evaluated in terms of efficiency and performance. These aspects of retrieval systems, whilst very important, do not cover a crucial aspect of the system, the access it provides to the documents of ...
expand
Significant Words Representations of Entities
Mostafa Dehghani
Pages: 1183-1183
doi>10.1145/2911451.2911474
Full text: PDFPDF

Transforming the data into a suitable representation is the first key step of data analysis, and the performance of any data oriented method is heavily depending on it. We study questions on how we can best learn representations for textual entities ...
expand
Time-Quality Trade-offs in Search
Ryan Burton
Pages: 1185-1185
doi>10.1145/2911451.2911484
Full text: PDFPDF

In this paper, I propose a research agenda surrounding the notion of slow search, where retrieval speed may be traded for improvements in result quality. This time-quality trade- off leads to a number of implications in the areas of human- computer interaction ...
expand
Torii: Attribute-based Polarity Analysis with Big Datasets
Fernando O. Gallego
Pages: 1187-1187
doi>10.1145/2911451.2911479
Full text: PDFPDF

Polarity analysis has become a key aspect of market analysis. The number of companies that are interested in the general opinion of the crowd regarding the items that they sell is increasing everyday. Attribute-based polarity analysis is a fine-grained ...
expand
User Interaction in Mobile Web Search
Jaewon Kim
Pages: 1189-1189
doi>10.1145/2911451.2911476
Full text: PDFPDF

From previous studies, we believe that search behaviour on touch-enabled mobile devices is different from the behaviour with desktop screens. In the proposed research, we intend to explore user interaction while searching with the aim of improving search ...
expand
TUTORIAL SESSION: Tutorials
Collaborative Information Seeking: Art and Science of Achieving 1+1>2 in IR
Chirag Shah
Pages: 1191-1194
doi>10.1145/2911451.2914801
Full text: PDFPDF

Traditional IR techniques, systems, and methods that assume an individual searcher are often shown to be inadequate for addressing search problems that are multi-faceted and/or too complex or difficult for individuals. The next big leap in information ...
expand
Constructing and Mining Web-scale Knowledge Graphs
Evgeniy Gabrilovich, Nicolas Usunier
Pages: 1195-1197
doi>10.1145/2911451.2914807
Full text: PDFPDF

Recent years have witnessed a proliferation of large-scale knowledge graphs, from purely academic projects such as YAGO to major commercial projects such as Google's Knowledge Graph and Microsoft's Satori. Whereas there is a large body of research on ...
expand
Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement
Thorsten Joachims, Adith Swaminathan
Pages: 1199-1201
doi>10.1145/2911451.2914803
Full text: PDFPDF

Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting ...
expand
Deep Learning for Information Retrieval
Hang Li, Zhengdong Lu
Pages: 1203-1206
doi>10.1145/2911451.2914800
Full text: PDFPDF

Recent years have observed a significant progress in information retrieval and natural language processing with deep learning technologies being successfully applied into almost all of their major tasks. The key to the success of deep learning is its ...
expand
From Design to Analysis: Conducting Controlled Laboratory Experiments with Users
Diane Kelly, Anita Crescenzi
Pages: 1207-1210
doi>10.1145/2911451.2914809
Full text: PDFPDF

This full-day tutorial provides general instruction about the design of controlled laboratory experiments that are conducted in order to better understand human information interaction and retrieval. Different data collection methods and procedures are ...
expand
Instant Search: A Hands-on Tutorial
Ganesh Venkataraman, Abhimanyu Lad, Viet Ha-Thuc, Dhruv Arya
Pages: 1211-1214
doi>10.1145/2911451.2914806
Full text: PDFPDF

Instant search has become a common part of the search experience in most popular search engines and social networking websites. The goal is to provide instant feedback to the user in terms of query completions ("instant suggestions") or directly provide ...
expand
Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial
Artem Grotov, Maarten de Rijke
Pages: 1215-1218
doi>10.1145/2911451.2914798
Full text: PDFPDF

During the past 10--15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, ...
expand
Question Answering with Knowledge Base, Web and Beyond
Wen-tau Yih, Hao Ma
Pages: 1219-1221
doi>10.1145/2911451.2914804
Full text: PDFPDF

In this tutorial, we give the audience a coherent overview of the research of question answering (QA). We first introduce a variety of QA problems proposed by pioneer researchers and briefly describe the early efforts. By contrasting with the current ...
expand
Scalability and Efficiency Challenges in Large-Scale Web Search Engines
B. Barla Cambazoglu, Ricardo Baeza-Yates
Pages: 1223-1226
doi>10.1145/2911451.2914808
Full text: PDFPDF

Commercial web search engines need to process thousands of queries every second and provide responses to user queries within a few hundred milliseconds. As a consequence of these tight performance constraints, search engines construct and maintain very ...
expand
Simulation of Interaction: A Tutorial on Modelling and Simulating User Interaction and Search Behaviour
Leif Azzopardi
Pages: 1227-1230
doi>10.1145/2911451.2914799
Full text: PDFPDF

Search is an inherently interactive, non-deterministic and user-dependent process. This means that there are many different possible sequences of interactions which could be taken (some ending in success and others ending in failure). Simulation provides ...
expand
Succinct Data Structures in Information Retrieval: Theory and Practice
Simon Gog, Rossano Venturini
Pages: 1231-1233
doi>10.1145/2911451.2914802
Full text: PDFPDF

Succinct data structures are used today in many information retrieval applications, e.g., posting lists representation, language model representation, indexing (social) graphs, query auto-completion, document retrieval and indexing dictionary of strings, ...
expand
Temporal Information Retrieval
Nattiya Kanhabua, Avishek Anand
Pages: 1235-1238
doi>10.1145/2911451.2914805
Full text: PDFPDF

The study of temporal dynamics and its impact can be framed within the so-called temporal IR approaches, which explain how user behavior, document content and scale vary with time, and how we can use them in our favor in order to improve retrieval effectiveness. ...
expand
WORKSHOP SESSION: Workshops
Third International Workshop on Gamification for Information Retrieval (GamifIR'16)
Michael Meder, Frank Hopfgartner, Gabriella Kazai, Udo Kruschwitz
Pages: 1239-1240
doi>10.1145/2911451.2917759
Full text: PDFPDF

Stronger engagement and greater participation is often crucial to reach a goal or to solve an issue. Issues like the emerging employee engagement crisis, insufficient knowledge sharing, and chronic procrastination. In many cases we need and search for ...
expand
HIA'16: The 2nd International Workshop on Heterogeneous Information Access at SIGIR 2016
Ke Zhou, Yiqun Liu, Roger Jie Luo, Joemon Jose
Pages: 1241-1241
doi>10.1145/2911451.2917760
Full text: PDFPDF

Information access is becoming increasingly heterogeneous. Especially when the user's information need is for exploratory purpose, returning a set of diverse results from different resources could benefit the user. For example, when a user is planning ...
expand
Medical Information Search Workshop (MEDIR)
Steven Bedrick, Lorraine Goeuriot, Gareth J.F. Jones, Anastasia Krithara, Henning Mueller, George Paliouras
Pages: 1243-1243
doi>10.1145/2911451.2917761
Full text: PDFPDF
Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval
Nick Craswell, W. Bruce Croft, Jiafeng Guo, Bhaskar Mitra, Maarten de Rijke
Pages: 1245-1246
doi>10.1145/2911451.2917762
Full text: PDFPDF

In recent years, deep neural networks have yielded significant performance improvements on speech recognition and computer vision tasks, as well as led to exciting breakthroughs in novel application areas such as automatic voice translation, image captioning, ...
expand
Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media
Hui Yang, Ian Soboroff, Li Xiong, Charles L.A. Clarke, Simson L. Garfinkel
Pages: 1247-1248
doi>10.1145/2911451.2917763
Full text: PDFPDF

Due to lack of mature techniques in privacy-preserving information retrieval (IR), concerns about information privacy and security have become serious obstacles that prevent valuable user data to be used in IR research such as studies on query logs, ...
expand
Search as Learning (SAL) Workshop 2016
Jacek Gwizdka, Preben Hansen, Claudia Hauff, Jiyin He, Noriko Kando
Pages: 1249-1250
doi>10.1145/2911451.2917766
Full text: PDFPDF

The "Search as Learning" (SAL) workshop is focused on an area within the information retrieval field that is only beginning to emerge: supporting users in their learning whilst interacting with information content.
expand
SIGIR 2016 Workshop WebQA II: Web Question Answering Beyond Factoids
Alessandro Moschitti, Lluiís Márquez, Preslav Nakov, Eugene Agichtein, Charles Clarke, Idan Szpektor
Pages: 1251-1252
doi>10.1145/2911451.2917767
Full text: PDFPDF

Web search engines have made great progress at answering factoid queries. However, they are not well-tailored for managing more complex questions, especially when they require explanation and/or description. The WebQA workshop series aims at exploring ...
expand

Powered by The ACM Guide to Computing Literature


Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Table of Contents
SESSION: Salton Award
Charlie Clarke
Salton Award Lecture: People, Interacting with Information
Nicholas J. Belkin
Pages: 1-2
doi>10.1145/2766462.2767854
Full text: PDFPDF

Colleagues, friends, let me begin by expressing how pleased, and humbly honored I am to be a recipient of the Gerard Salton Award. Gerry was a great man, and to receive the award named for him is very special. For me personally, it is especially meaningful, ...
expand
SESSION: Session 1A: Assisting the Search
Ellen Voorhes
Exploring Session Context using Distributed Representations of Queries and Reformulations
Bhaskar Mitra
Pages: 3-12
doi>10.1145/2766462.2767702
Full text: PDFPDF

Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "San Francisco" -- "San Francisco 49ers" is semantically similar to "Detroit" -- "Detroit Lions". Likewise, "London" -- "things ...
expand
Honorable Mention An Eye-Tracking Study of Query Reformulation
Carsten Eickhoff, Sebastian Dungs, Vu Tran
Pages: 13-22
doi>10.1145/2766462.2767703
Full text: PDFPDF

Information about a user's domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user's previous ...
expand
Differences in the Use of Search Assistance for Tasks of Varying Complexity
Robert Capra, Jaime Arguello, Anita Crescenzi, Emily Vardell
Pages: 23-32
doi>10.1145/2766462.2767741
Full text: PDFPDF

In this paper, we study how users interact with a search assistance tool while completing tasks of varying complexity. We designed a novel tool referred to as the search guide (SG) that displays the search trails (queries issued, results clicked, pages ...
expand
SESSION: Session 1B: Multimedia
Doug Oard
Dynamic Query Modeling for Related Content Finding
Daan Odijk, Edgar Meij, Isaac Sijaranamual, Maarten de Rijke
Pages: 33-42
doi>10.1145/2766462.2767715
Full text: PDFPDF

While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated ...
expand
Image-Based Recommendations on Styles and Substitutes
Julian McAuley, Christopher Targett, Qinfeng Shi, Anton van den Hengel
Pages: 43-52
doi>10.1145/2766462.2767755
Full text: PDFPDF

Humans inevitably develop a sense of the relationships between objects, some of which are based on their appearance. Some pairs of objects might be seen as being alternatives to each other (such as two pairs of jeans), while others may be seen as being ...
expand
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
Yingwei Pan, Ting Yao, Houqiang Li, Chong-Wah Ngo, Tao Mei
Pages: 53-62
doi>10.1145/2766462.2767725
Full text: PDFPDF

Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that leveraging ...
expand
SESSION: Session 1C: Efficient Algorithms
Andrew Trotman
Optimal Aggregation Policy for Reducing Tail Latency of Web Search
Jeong-Min Yun, Yuxiong He, Sameh Elnikety, Shaolei Ren
Pages: 63-72
doi>10.1145/2766462.2767708
Full text: PDFPDF

A web search engine often employs partition-aggregate architecture, where an aggregator propagates a user query to all index serving nodes (ISNs) and collects the responses from them. An aggregation policy determines how long the aggregators wait for ...
expand
Best Paper QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, Rossano Venturini
Pages: 73-82
doi>10.1145/2766462.2767733
Full text: PDFPDF

Learning-to-Rank models based on additive ensembles of regression trees have proven to be very effective for ranking query results returned by Web search engines, a scenario where quality and efficiency requirements are very demanding. Unfortunately, ...
expand
High Quality Graph-Based Similarity Search
Weiren Yu, Julie Ann McCann
Pages: 83-92
doi>10.1145/2766462.2767720
Full text: PDFPDF

SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately ...
expand
SESSION: Session 2A: Diversity and Bias
Gareth Jones
Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes
Zhaochun Ren, Maarten de Rijke
Pages: 93-102
doi>10.1145/2766462.2767713
Full text: PDFPDF

Given a topic of interest, a contrastive theme is a group of opposing pairs of viewpoints. We address the task of summarizing contrastive themes: given a set of opinionated documents, select meaningful sentences to represent contrastive themes present ...
expand
Splitting Water: Precision and Anti-Precision to Reduce Pool Bias
Aldo Lipani, Mihai Lupu, Allan Hanbury
Pages: 103-112
doi>10.1145/2766462.2767749
Full text: PDFPDF

For many tasks in evaluation campaigns, especially those modeling narrow domain-specific challenges, lack of participation leads to a potential pooling bias due to the scarce number of pooled runs. It is well known that the reliability of a test collection ...
expand
Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures
Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
Pages: 113-122
doi>10.1145/2766462.2767710
Full text: PDFPDF

In this paper we address the issue of learning a ranking model for search result diversification. In the task, a model concerns with both query-document relevance and document diversity is automatically created with training data. Ideally a diverse ranking ...
expand
SESSION: Session 1B: Queries
Milad Shokouhi
Analyzing User's Sequential Behavior in Query Auto-Completion via Markov Processes
Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Hongyuan Zha, Ricardo Baeza-Yates
Pages: 123-132
doi>10.1145/2766462.2767723
Full text: PDFPDF

Query auto-completion (QAC) plays an important role in assisting users typing less while submitting a query. The QAC engine generally offers a list of suggested queries that start with a user's input as a prefix, and the list of suggestions is changed ...
expand
Learning by Example: Training Users with High-quality Query Suggestions
Morgan Harvey, Claudia Hauff, David Elsweiler
Pages: 133-142
doi>10.1145/2766462.2767731
Full text: PDFPDF

The queries submitted by users to search engines often poorly describe their information needs and represent a potential bottleneck in the system. In this paper we investigate to what extent it is possible to aid users in learning how to formulate better ...
expand
adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback
Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, Jiawei Han
Pages: 143-152
doi>10.1145/2766462.2767697
Full text: PDFPDF

Query auto-completion (QAC) facilitates user query composition by suggesting queries given query prefix inputs. In 2014, global users of Yahoo! Search saved more than 50% keystrokes when submitting English queries by selecting suggestions of QAC. Users' ...
expand
SESSION: Session 2C: Graphs
Jaap Kamps
A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking
Giang Tran, Ata Turk, B. Barla Cambazoglu, Wolfgang Nejdl
Pages: 153-162
doi>10.1145/2766462.2767737
Full text: PDFPDF

Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and ...
expand
A Similarity Measure for Weaving Patterns in Textiles
Sven Helmer, Vuong Minh Ngo
Pages: 163-172
doi>10.1145/2766462.2767735
Full text: PDFPDF

We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of $k$-neighborhoods ...
expand
Local Ranking Problem on the BrowseGraph
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
Pages: 173-182
doi>10.1145/2766462.2767704
Full text: PDFPDF

The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink ...
expand
SESSION: Session 3A: Search Experience
Birger Larsen
How many results per page?: A Study of SERP Size, Search Behavior and User Experience
Diane Kelly, Leif Azzopardi
Pages: 183-192
doi>10.1145/2766462.2767732
Full text: PDFPDF

The provision of "ten blue links" has emerged as the standard for the design of search engine result pages (SERPs). While numerous aspects of SERPs have been examined, little attention has been paid to the number of results displayed per page. This paper ...
expand
Influence of Vertical Result in Web Search Examination
Zeyang Liu, Yiqun Liu, Ke Zhou, Min Zhang, Shaoping Ma
Pages: 193-202
doi>10.1145/2766462.2767714
Full text: PDFPDF

Research in how users examine results on search engine result pages (SERPs) helps improve result ranking, advertisement placement, performance evaluation and search UI design. Although examination behavior on organic search results (also known as "ten ...
expand
Unconscious Physiological Effects of Search Latency on Users and Their Click Behaviour
Miguel Barreda-Ángeles, Ioannis Arapakis, Xiao Bai, B. Barla Cambazoglu, Alexandre Pereda-Baños
Pages: 203-212
doi>10.1145/2766462.2767719
Full text: PDFPDF

Understanding the impact of a search system's response latency on its users' searching behaviour has been recently an active research topic in the information retrieval and human-computer interaction areas. Along the same line, this paper focuses on ...
expand
SESSION: Session 3B: Social Media
Claudia Hauff
Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction
Xuemeng Song, Liqiang Nie, Luming Zhang, Mohammad Akbari, Tat-Seng Chua
Pages: 213-222
doi>10.1145/2766462.2767726
Full text: PDFPDF

We are living in the era of social networks, where people throughout the world are connected and organized by multiple social networks. The views revealed by different social networks may vary according to the different services they offer. They are ...
expand
HSpam14: A Collection of 14 Million Tweets for Hashtag-Oriented Spam Research
Surendra Sedhai, Aixin Sun
Pages: 223-232
doi>10.1145/2766462.2767701
Full text: PDFPDF

Hashtag facilitates information diffusion in Twitter by creating dynamic and virtual communities for information aggregation from all Twitter users. Because hashtags serve as additional channels for one's tweets to be potentially accessed by other users ...
expand
Uncovering Crowdsourced Manipulation of Online Reviews
Amir Fayazi, Kyumin Lee, James Caverlee, Anna Squicciarini
Pages: 233-242
doi>10.1145/2766462.2767742
Full text: PDFPDF

Online reviews are a cornerstone of consumer decision making. However, their authenticity and quality has proven hard to control, especially as polluters target these reviews toward promoting products or in degrading competitors. In a troubling direction, ...
expand
SESSION: Session 3C: Entities
Krisztian Balog
Relevance Scores for Triples from Type-Like Relations
Hannah Bast, Björn Buchhold, Elmar Haussmann
Pages: 243-252
doi>10.1145/2766462.2767734
Full text: PDFPDF

We compute and evaluate relevance scores for knowledge-base triples from type-like relations. Such a score measures the degree to which an entity "belongs" to a type. For example, Quentin Tarantino has various professions, including Film Director, Screenwriter, ...
expand
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data
Nikita Zhiltsov, Alexander Kotov, Fedor Nikolaev
Pages: 253-262
doi>10.1145/2766462.2767756
Full text: PDFPDF

Previously proposed approaches to ad-hoc entity retrieval in the Web of Data (ERWD) used multi-fielded representation of entities and relied on standard unigram bag-of-words retrieval models. Although retrieval models incorporating term dependencies ...
expand
Mining, Ranking and Recommending Entity Aspects
Ridho Reinanda, Edgar Meij, Maarten de Rijke
Pages: 263-272
doi>10.1145/2766462.2767724
Full text: PDFPDF

Entity queries constitute a large fraction of web search queries and most of these queries are in the form of an entity mention plus some context terms that represent an intent in the context of that entity. We refer to these entity-oriented search intents ...
expand
SESSION: Session 4A: User Models
Diane Kelly
Bayesian Ranker Comparison Based on Historical User Interactions
Artem Grotov, Shimon Whiteson, Maarten de Rijke
Pages: 273-282
doi>10.1145/2766462.2767730
Full text: PDFPDF

We address the problem of how to safely compare rankers for information retrieval. In particular, we consider how to control the risks associated with switching from an existing production ranker to a new candidate ranker. Whereas existing online comparison ...
expand
Honorable Mention Incorporating Non-sequential Behavior into Click Models
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, Shaoping Ma
Pages: 283-292
doi>10.1145/2766462.2767712
Full text: PDFPDF

Click-through information is considered as a valuable source of users' implicit relevance feedback. As user behavior is usually influenced by a number of factors such as position, presentation style and site reputation, researchers have proposed a variety ...
expand
Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction
Jiyin He, Marc Bron, Arjen de Vries, Leif Azzopardi, Maarten de Rijke
Pages: 293-302
doi>10.1145/2766462.2767740
Full text: PDFPDF

Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and ...
expand
SESSION: Session 4B: Recommending
Paul Benett
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation
Chao Chen, Dongsheng Li, Yingying Zhao, Qin Lv, Li Shang
Pages: 303-312
doi>10.1145/2766462.2767718
Full text: PDFPDF

Matrix approximation is one of the most effective methods for collaborative filtering-based recommender systems. However, the high computation complexity of matrix factorization on large datasets limits its scalability. Prior solutions have adopted co-clustering ...
expand
Effective Latent Models for Binary Feedback in Recommender Systems
Maksims Volkovs, Guang Wei Yu
Pages: 313-322
doi>10.1145/2766462.2767716
Full text: PDFPDF

In many collaborative filtering (CF) applications, latent approaches are the preferred model choice due to their ability to generate real-time recommendations efficiently. However, the majority of existing latent models are not designed for implicit ...
expand
Personalized Recommendation via Parameter-Free Contextual Bandits
Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, Tao Li
Pages: 323-332
doi>10.1145/2766462.2767707
Full text: PDFPDF

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
expand
SESSION: Session 4C: Classifying & Ranking
Yi Chang
An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing
Sérgio Canuto, Marcos Gonçalves, Wisllay Santos, Thierson Rosa, Wellington Martins
Pages: 333-342
doi>10.1145/2766462.2767743
Full text: PDFPDF

The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses ...
expand
Listwise Collaborative Filtering
Shanshan Huang, Shuaiqiang Wang, Tie-Yan Liu, Jun Ma, Zhumin Chen, Jari Veijalainen
Pages: 343-352
doi>10.1145/2766462.2767693
Full text: PDFPDF

Recently, ranking-oriented collaborative filtering (CF) algorithms have achieved great success in recommender systems. They obtained state-of-the-art performances by estimating a preference ranking of items for each user rather than estimating the absolute ...
expand
BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification
Thiago Salles, Marcos Gonçalves, Victor Rodrigues, Leonardo Rocha
Pages: 353-362
doi>10.1145/2766462.2767747
Full text: PDFPDF

Random Forests (RF) and Boosting are two of the most successful supervised learning paradigms for automatic classification. In this work we propose to combine both strategies in order to exploit their strengths while simultaneously solving some of their ...
expand
SESSION: Session 5A: Deep Learning
Berthier Ribeiro-Neto
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
Ivan Vulić, Marie-Francine Moens
Pages: 363-372
doi>10.1145/2766462.2767752
Full text: PDFPDF

We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several ...
expand
Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
Aliaksei Severyn, Alessandro Moschitti
Pages: 373-382
doi>10.1145/2766462.2767738
Full text: PDFPDF

Learning a similarity function between pairs of objects is at the core of learning to rank approaches. In information retrieval tasks we typically deal with query-document pairs, in question answering -- question-answer pairs. However, before learning ...
expand
Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search
Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Narayan Bhamidipati
Pages: 383-392
doi>10.1145/2766462.2767709
Full text: PDFPDF

Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the ...
expand
SESSION: Session 5B: Products
Grace Hui Yang
Retrieval of Relevant Opinion Sentences for New Products
Dae Hoon Park, Hyun Duk Kim, ChengXiang Zhai, Lifan Guo
Pages: 393-402
doi>10.1145/2766462.2767748
Full text: PDFPDF

With the rapid development of Internet and E-commerce, abundant product reviews have been written by consumers who bought the products. These reviews are very useful for consumers to optimize their purchasing decisions. However, since the reviews are ...
expand
Learning Hierarchical Representation Model for NextBasket Recommendation
Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, Xueqi Cheng
Pages: 403-412
doi>10.1145/2766462.2767694
Full text: PDFPDF

Next basket recommendation is a crucial task in market basket analysis. Given a user's purchase history, usually a sequence of transaction data, one attempts to build a recommender that can predict the next few items that the user most probably would ...
expand
Parametric and Non-parametric User-aware Sentiment Topic Models
Zaihan Yang, Alexander Kotov, Aravind Mohan, Shiyong Lu
Pages: 413-422
doi>10.1145/2766462.2767758
Full text: PDFPDF

The popularity of Web 2.0 has resulted in a large number of publicly available online consumer reviews created by a demographically diverse user base. Information about the authors of these reviews, such as age, gender and location, provided by many ...
expand
SESSION: Session 5C: Locations
Craig Macdonald
Learning to Extract Local Events from the Web
John Foley, Michael Bendersky, Vanja Josifovski
Pages: 423-432
doi>10.1145/2766462.2767739
Full text: PDFPDF

The goal of this work is extraction and retrieval of local events from web pages. Examples of local events include small venue concerts, theater performances, garage sales, movie screenings, etc. We collect these events in the form of retrievable calendar ...
expand
Rank-GeoFM: A Ranking based Geographical Factorization Method for Point of Interest Recommendation
Xutao Li, Gao Cong, Xiao-Li Li, Tuan-Anh Nguyen Pham, Shonali Krishnaswamy
Pages: 433-442
doi>10.1145/2766462.2767722
Full text: PDFPDF

With the rapid growth of location-based social networks, Point of Interest (POI) recommendation has become an important research problem. However, the scarcity of the check-in data, a type of implicit feedback data, poses a severe challenge for existing ...
expand
GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations
Jia-Dong Zhang, Chi-Yin Chow
Pages: 443-452
doi>10.1145/2766462.2767711
Full text: PDFPDF

Recommending users with their preferred points-of-interest (POIs), e.g., museums and restaurants, has become an important feature for location-based social networks (LBSNs), which benefits people to explore new places and businesses to discover potential ...
expand
SESSION: Session 6A: Experiment Design
Alistair Moffat
Optimised Scheduling of Online Experiments
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis
Pages: 453-462
doi>10.1145/2766462.2767706
Full text: PDFPDF

Modern search engines increasingly rely on online evaluation methods such as A/B tests and interleaving. These online evaluation methods make use of interactions by the search engine's users to test various changes in the search engine. However, since ...
expand
Predicting Search Satisfaction Metrics with Interleaved Comparisons
Anne Schuth, Katja Hofmann, Filip Radlinski
Pages: 463-472
doi>10.1145/2766462.2767695
Full text: PDFPDF

The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled experiment, AB tests compare the performance of an experimental system (treatment) on one sample of the user population, to that of a baseline system ...
expand
Best Student Paper Sequential Testing for Early Stopping of Online Experiments
Eugene Kharitonov, Aleksandr Vorobev, Craig Macdonald, Pavel Serdyukov, Iadh Ounis
Pages: 473-482
doi>10.1145/2766462.2767729
Full text: PDFPDF

Online evaluation methods, such as A/B and interleaving experiments, are widely used for search engine evaluation. Since they rely on noisy implicit user feedback, running each experiment takes a considerable time. Recently, the problem of reducing the ...
expand
SESSION: Session 6B: Predicting
Djoerd Hiemstra
Inferring Searcher Attention by Jointly Modeling User Interactions and Content Salience
Dmitry Lagun, Eugene Agichtein
Pages: 483-492
doi>10.1145/2766462.2767745
Full text: PDFPDF

Modeling and predicting user attention is crucial for interpreting search behavior. The numerous applications include quantifying web search satisfaction, estimating search quality, and measuring and predicting online user engagement. While prior research ...
expand
Different Users, Different Opinions: Predicting Search Satisfaction with Mouse Movement Information
Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, Xuan Zhu
Pages: 493-502
doi>10.1145/2766462.2767721
Full text: PDFPDF

Satisfaction prediction is one of the prime concerns in search performance evaluation. It is a non-trivial task for two major reasons: (1) The definition of satisfaction is rather subjective and different users may have different opinions in satisfaction ...
expand
Predicting Search Intent Based on Pre-Search Context
Weize Kong, Rui Li, Jie Luo, Aston Zhang, Yi Chang, James Allan
Pages: 503-512
doi>10.1145/2766462.2767757
Full text: PDFPDF

While many studies have been conducted on query understanding, there is limited understanding on why users start searches and how to predict search intent. In this paper, we propose to study this important but less explored problem. Our key intuition ...
expand
SESSION: Session 6C: Tasks and Devices
Emine Yilmaz
Leveraging Procedural Knowledge for Task-oriented Search
Zi Yang, Eric Nyberg
Pages: 513-522
doi>10.1145/2766462.2767744
Full text: PDFPDF

Many search engine users attempt to satisfy an information need by issuing multiple queries, with the expectation that each result will contribute some portion of the required information. Previous research has shown that structured or semi-structured ...
expand
Personalizing Search on Shared Devices
Ryen W. White, Ahmed Hassan Awadallah
Pages: 523-532
doi>10.1145/2766462.2767736
Full text: PDFPDF

Search personalization tailors the search experience to individual searchers. To do this, search engines construct interest models comprising signals from observed behavior associated with ma-chines, often via Web browser cookies or other user identifiers. ...
expand
Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval
Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang
Pages: 533-542
doi>10.1145/2766462.2767759
Full text: PDFPDF

Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the ...
expand
SESSION: Keynote
Ricardo Baeza-Yates
Towards a Game-Theoretic Framework for Information Retrieval
ChengXiang Zhai
Pages: 543-543
doi>10.1145/2766462.2767853
Full text: PDFPDF

The task of information retrieval (IR) has traditionally been defined as to rank a collection of documents in response to a query. While this definition has enabled most research progress in IR so far, it does not model accurately the actual retrieval ...
expand
SESSION: Session 7A: Assessing
Justin Zobel
Representative & Informative Query Selection for Learning to Rank using Submodular Functions
Rishabh Mehrotra, Emine Yilmaz
Pages: 545-554
doi>10.1145/2766462.2767753
Full text: PDFPDF

The performance of Learning to Rank algorithms strongly depend on the number of labelled queries in the training set, while the cost incurred in annotating a large number of queries with relevance judgements is prohibitively high. As a result, constructing ...
expand
Impact of Surrogate Assessments on High-Recall Retrieval
Adam Roegiest, Gordon V. Cormack, Charles L.A. Clarke, Maura R. Grossman
Pages: 555-564
doi>10.1145/2766462.2767754
Full text: PDFPDF

We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed ...
expand
Honorable Mention The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation
Andrew Turpin, Falk Scholer, Stefano Mizzaro, Eddy Maddalena
Pages: 565-574
doi>10.1145/2766462.2767760
Full text: PDFPDF

Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance ...
expand
SESSION: Session 7B: Terms
Arjen de Vries
Learning to Reweight Terms with Distributed Representations
Guoqing Zheng, Jamie Callan
Pages: 575-584
doi>10.1145/2766462.2767700
Full text: PDFPDF

Term weighting is a fundamental problem in IR research and numerous weighting models have been proposed. Proper term weighting can greatly improve retrieval accuracies, which essentially involves two types of query understanding: interpreting the query ...
expand
A Probabilistic Model for Information Retrieval Based on Maximum Value Distribution
Jiaul H. Paik
Pages: 585-594
doi>10.1145/2766462.2767762
Full text: PDFPDF

The main goal of a retrieval model is to measure the degree of relevance of a document with respect to the given query. Probabilistic models are widely used to measure the likelihood of relevance of a document by combining within document term frequency ...
expand
Non-Compositional Term Dependence for Information Retrieval
Christina Lioma, Jakob Grue Simonsen, Birger Larsen, Niels Dalum Hansen
Pages: 595-604
doi>10.1145/2766462.2767717
Full text: PDFPDF

Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical ...
expand
SESSION: Session 8A: Variability in test collections
Mark Smucker
On the Relation Between Assessor's Agreement and Accuracy in Gamified Relevance Assessment
Olga Megorskaya, Vladimir Kukushkin, Pavel Serdyukov
Pages: 605-614
doi>10.1145/2766462.2767727
Full text: PDFPDF

Expert judgments (labels) are widely used in Information Retrieval for the purposes of search quality evaluation and machine learning. Setting up the process of collecting such judgments is a challenge of its own, and the maintenance of judgments quality ...
expand
Assessor Differences and User Preferences in Tweet Timeline Generation
Yulu Wang, Garrick Sherman, Jimmy Lin, Miles Efron
Pages: 615-624
doi>10.1145/2766462.2767699
Full text: PDFPDF

In information retrieval evaluation, when presented with an effectiveness difference between two systems, there are three relevant questions one might ask. First, are the differences statistically significant? Second, is the comparison stable with respect ...
expand
User Variability and IR System Evaluation
Peter Bailey, Alistair Moffat, Falk Scholer, Paul Thomas
Pages: 625-634
doi>10.1145/2766462.2767728
Full text: PDFPDF

Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We ...
expand
SESSION: Session 8B: Citations
Mark Sanderson
An Entity Class-Dependent Discriminative Mixture Model for Cumulative Citation Recommendation
Jingang Wang, Dandan Song, Qifan Wang, Zhiwei Zhang, Luo Si, Lejian Liao, Chin-Yew Lin
Pages: 635-644
doi>10.1145/2766462.2767698
Full text: PDFPDF

This paper studies Cumulative Citation Recommendation (CCR) for Knowledge Base Acceleration (KBA). The CCR task aims to detect potential citations of a set of target entities with priorities from a volume of temporally-ordered stream corpus. Previous ...
expand
Scientific Information Understanding via Open Educational Resources (OER)
Xiaozhong Liu, Zhuoren Jiang, Liangcai Gao
Pages: 645-654
doi>10.1145/2766462.2767750
Full text: PDFPDF

Scientific publication retrieval/recommendation has been investigated in the past decade. However, to the best of our knowledge, few efforts have been made to help junior scholars and graduate students to understand and consume the essence of those scientific ...
expand
In Situ Insights
Yuanhua Lv, Ariel Fuxman
Pages: 655-664
doi>10.1145/2766462.2767696
Full text: PDFPDF

When consuming content in applications such as e-readers, word processors, and Web browsers, users often see mentions to topics (or concepts) that attract their attention. In a scenario of significant practical interest, topics are explored in situ, ...
expand
SESSION: Session 9A: Streams
Fernando Diaz
Islands in the Stream: A Study of Item Recommendation within an Enterprise Social Stream
Ido Guy, Roy Levin, Tal Daniel, Ella Bolshinsky
Pages: 665-674
doi>10.1145/2766462.2767746
Full text: PDFPDF

Social streams allow users to receive updates from their network by syndicating social media activity. These streams have become a popular way to share and consume information both on the web and in the enterprise. With so much activity going on, filtering ...
expand
Evaluating Streams of Evolving News Events
Gaurav Baruah, Mark D. Smucker, Charles L.A. Clarke
Pages: 675-684
doi>10.1145/2766462.2767751
Full text: PDFPDF

People track news events according to their interests and available time. For a major event of great personal interest, they might check for updates several times an hour, taking time to keep abreast of all aspects of the evolving event. For minor events ...
expand
SESSION: Session 9B: Cards
Eugene Agichtein
Information Retrieval as Card Playing: A Formal Model for Optimizing Interactive Retrieval Interface
Yinan Zhang, Chengxiang Zhai
Pages: 685-694
doi>10.1145/2766462.2767761
Full text: PDFPDF

We propose a novel formal model for optimizing interactive information retrieval interfaces. To model interactive retrieval in a general way, we frame the task of an interactive retrieval system as to choose a sequence of interface cards to present to ...
expand
From Queries to Cards: Re-ranking Proactive Card Recommendations Based on Reactive Search History
Milad Shokouhi, Qi Guo
Pages: 695-704
doi>10.1145/2766462.2767705
Full text: PDFPDF

The growing accessibility of mobile devices has substantially reformed the way users access information. While the reactive search by query remains as common as before, recent years have witnessed the emergence of various proactive systems such as Google ...
expand
SESSION: Short Papers
Using Sensor Metadata Streams to Identify Topics of Local Events in the City
M-Dyaa Albakour, Craig Macdonald, Iadh Ounis
Pages: 711-714
doi>10.1145/2766462.2767837
Full text: PDFPDF

In this paper, we study the emerging Information Retrieval (IR) task of local event retrieval using sensor metadata streams. Sensor metadata streams include information such as the crowd density from video processing, audio classifications, and social ...
expand
StarSum: A Simple Star Graph for Multi-document Summarization
Mohammed Al-Dhelaan
Pages: 715-718
doi>10.1145/2766462.2767790
Full text: PDFPDF

Graph-based approaches for multi-document summarization have been widely used to extract top sentences for a summary. Traditionally, the documents' cluster is modeled as a graph of the cluster's sentences only which might limit the ability of recognizing ...
expand
When Relevance Judgement is Happening?: An EEG-based Study
Marco Allegretti, Yashar Moshfeghi, Maria Hadjigeorgieva, Frank E. Pollick, Joemon M. Jose, Gabriella Pasi
Pages: 719-722
doi>10.1145/2766462.2767811
Full text: PDFPDF

Relevance is a central notion in Information Retrieval, but it is considered to be a difficult concept to define. We analyse brain signals for the first 800 milliseconds (ms) of a relevance assessment process to answer the question "when relevance is ...
expand
Search Engine Evaluation based on Search Engine Switching Prediction
Olga Arkhipova, Lidia Grauer, Igor Kuralenok, Pavel Serdyukov
Pages: 723-726
doi>10.1145/2766462.2767786
Full text: PDFPDF

In this paper we present a novel application of the search engine switching prediction model for online evaluation. We propose a new metric pSwitch for A/B-testing, which allows us to evaluate the quality of search engines in different aspects such as ...
expand
Time-Aware Authorship Attribution for Short Text Streams
Hosein Azarbonyad, Mostafa Dehghani, Maarten Marx, Jaap Kamps
Pages: 727-730
doi>10.1145/2766462.2767799
Full text: PDFPDF

Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing ...
expand
A Priori Relevance Based On Quality and Diversity Of Social Signals
Ismail Badache, Mohand Boughanem
Pages: 731-734
doi>10.1145/2766462.2767807
Full text: PDFPDF

Social signals (users' actions) associated with web resources (documents) can be considered as an additional information that can play a role to estimate a priori importance of the resource. In this paper, we are particularly interested in: first, showing ...
expand
Document Comprehensiveness and User Preferences in Novelty Search Tasks
Ashraf Bah, Praveen Chandar, Ben Carterette
Pages: 735-738
doi>10.1145/2766462.2767820
Full text: PDFPDF

Different users may be attempting to satisfy different information needs while providing the same query to a search engine. Addressing that issue is addressing Novelty and Diversity in information retrieval. Novelty and Diversity search task models the ...
expand
Cost-Aware Result Caching for Meta-Search Engines
Emre Bakkal, Ismail Sengor Altingovde, Ismail Hakki Toroslu
Pages: 739-742
doi>10.1145/2766462.2767813
Full text: PDFPDF

Our goal in this paper is to design cost-aware result caching approaches for meta-search engines. We introduce different levels of eviction, namely, query-, resource- and entry-level, based on the granularity of the entries to be evicted from the cache ...
expand
From Unlabelled Tweets to Twitter-specific Opinion Words
Felipe Bravo-Marquez, Eibe Frank, Bernhard Pfahringer
Pages: 743-746
doi>10.1145/2766462.2767770
Full text: PDFPDF

In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic ...
expand
The Best Published Result is Random: Sequential Testing and its Effect on Reported Effectiveness
Ben Carterette
Pages: 747-750
doi>10.1145/2766462.2767812
Full text: PDFPDF

Reusable test collections allow researchers to rapidly test different algorithms to find the one that works "best". But because of randomness in the topic sample, or in relevance judgments, or in interactions among system components, extreme results ...
expand
Load-sensitive CPU Power Management for Web Search Engines
Matteo Catena, Craig Macdonald, Nicola Tonellotto
Pages: 751-754
doi>10.1145/2766462.2767809
Full text: PDFPDF

Web search engine companies require power-hungry data centers with thousands of servers to efficiently perform searches on a large scale. This permits the search engines to serve high arrival rates of user queries with low latency, but poses economical ...
expand
Retrieval from Noisy E-Discovery Corpus in the Absence of Training Data
Anirban Chakraborty, Kripabandhu Ghosh, Swapan Kumar Parui
Pages: 755-758
doi>10.1145/2766462.2767828
Full text: PDFPDF

OCR errors hurt retrieval performance to a great extent. Research has been done on modelling and correction of OCR errors. However, most of the existing systems use language dependent resources or training texts for studying the nature of errors. Not ...
expand
Opinion Spammer Detection in Web Forum
Yu-Ren Chen, Hsin-Hsi Chen
Pages: 759-762
doi>10.1145/2766462.2767766
Full text: PDFPDF

In this paper, a real case study on opinion spammer detection in web forum is presented. We explore user profiles, maximum spamicity of first posts of users, burstiness of registration of user accounts, and frequent poster set to build a model with SVM ...
expand
Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review
Gordon V. Cormack, Maura R. Grossman
Pages: 763-766
doi>10.1145/2766462.2767771
Full text: PDFPDF

Continuous active learning achieves high recall for technology-assisted review, not only for an overall information need, but also for various facets of that information need, whether explicit or implicit. Through simulations using Cormack and Grossman's ...
expand
Time Pressure and System Delays in Information Search
Anita Crescenzi, Diane Kelly, Leif Azzopardi
Pages: 767-770
doi>10.1145/2766462.2767817
Full text: PDFPDF

We report preliminary results of the impact of time pressure and system delays on search behavior from a laboratory study with forty-three participants. To induce time pressure, we randomly assigned half of our study participants to a treatment condition ...
expand
How Random Decisions Affect Selective Distributed Search
Zhuyun Dai, Yubin Kim, Jamie Callan
Pages: 771-774
doi>10.1145/2766462.2767796
Full text: PDFPDF

Selective distributed search is a retrieval architecture that reduces search costs by partitioning a corpus into topical shards such that only a few shards need to be searched for each query. Prior research created topical shards by using random seed ...
expand
Comparing Approaches for Query Autocompletion
Giovanni Di Santo, Richard McCreadie, Craig Macdonald, Iadh Ounis
Pages: 775-778
doi>10.1145/2766462.2767829
Full text: PDFPDF

Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large ...
expand
Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation
Alexey Drutsa
Pages: 779-782
doi>10.1145/2766462.2767814
Full text: PDFPDF

Modern Internet companies improve evaluation criteria of their data-driven decision-making that is based on online controlled experiments (also known as A/B tests). The amplitude metrics of user engagement are known to be well sensitive to service changes, ...
expand
Modelling Term Dependence with Copulas
Carsten Eickhoff, Arjen P. de Vries, Thomas Hofmann
Pages: 783-786
doi>10.1145/2766462.2767831
Full text: PDFPDF

Many generative language and relevance models assume conditional independence between the likelihood of observing individual terms. This assumption is obviously naive, but also hard to replace or relax. There are only very few term pairs that actually ...
expand
Modeling Website Topic Cohesion at Scale to Improve Webpage Classification
Dhivya Eswaran, Paul N. Bennett, Joseph J. Pfeiffer, III
Pages: 787-790
doi>10.1145/2766462.2767834
Full text: PDFPDF

Considerable work in web page classification has focused on incorporating the topical structure of the web (e.g., the hyperlink graph) to improve prediction accuracy. However, the majority of work has primarily focused on relational or graph-based methods ...
expand
Topic-centric Classification of Twitter User's Political Orientation
Anjie Fang, Iadh Ounis, Philip Habel, Craig Macdonald, Nut Limsopatham
Pages: 791-794
doi>10.1145/2766462.2767833
Full text: PDFPDF

In the recent Scottish Independence Referendum (hereafter, IndyRef), Twitter offered a broad platform for people to express their opinions, with millions of IndyRef tweets posted over the campaign period. In this paper, we aim to classify people's voting ...
expand
Word Embedding based Generalized Language Model for Information Retrieval
Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, Gareth J.F. Jones
Pages: 795-798
doi>10.1145/2766462.2767780
Full text: PDFPDF

Word2vec, a state-of-the-art word embedding technique has gained a lot of interest in the NLP community. The embedding of the word vectors helps to retrieve a list of words that are used in similar contexts with respect to a given word. In this paper, ...
expand
A Head-Weighted Gap-Sensitive Correlation Coefficient
Ning Gao, Douglas Oard
Pages: 799-802
doi>10.1145/2766462.2767793
Full text: PDFPDF

Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such ...
expand
On Term Selection Techniques for Patent Prior Art Search
Mona Golestan Far, Scott Sanner, Mohamed Reda Bouadjenek, Gabriela Ferraro, David Hawking
Pages: 803-806
doi>10.1145/2766462.2767801
Full text: PDFPDF

In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular ...
expand
Automatic Feature Generation on Heterogeneous Graph for Music Recommendation
Chun Guo, Xiaozhong Liu
Pages: 807-810
doi>10.1145/2766462.2767808
Full text: PDFPDF

Online music streaming services (MSS) experienced exponential growth over the past decade. The giant MSS providers not only built massive music collection with metadata, they also accumulated large amount of heterogeneous data generated from users, e.g. ...
expand
Differences in Eye-Tracking Measures Between Visits and Revisits to Relevant and Irrelevant Web Pages
Jacek Gwizdka, Yinglong Zhang
Pages: 811-814
doi>10.1145/2766462.2767795
Full text: PDFPDF

This short paper presents initial results from a project, in which we investigated differences in how users view relevant and irrelevant Web pages on their visits and revisits. The users' viewing of Web pages was characterized by eye-tracking measures, ...
expand
Reducing Hubness: A Cause of Vulnerability in Recommender Systems
Kazuo Hara, Ikumi Suzuki, Kei Kobayashi, Kenji Fukumizu
Pages: 815-818
doi>10.1145/2766462.2767823
Full text: PDFPDF

It is known that memory-based collaborative filtering systems are vulnerable to shilling attacks. In this paper, we demonstrate that hubness, which occurs in high dimensional data, is exploited by the attacks. Hence we explore methods for reducing hubness ...
expand
Modularity-Based Query Clustering for Identifying Users Sharing a Common Condition
Maayan Gal-On Harel, Elad Yom-Tov
Pages: 819-822
doi>10.1145/2766462.2767798
Full text: PDFPDF

We present an algorithm for identifying users who share a common condition from anonymized search engine logs. Input to the algorithm is a set of seed phrases that identify users with the condition of interest with high precision albeit at a very low ...
expand
Understanding Temporal Query Intent
Mohammed Hasanuzzaman, Sriparna Saha, Gaël Dias, Stéphane Ferrari
Pages: 823-826
doi>10.1145/2766462.2767792
Full text: PDFPDF

Understanding the temporal orientation of web search queries is an important issue for the success of information access systems. In this paper, we propose a multi-objective ensemble learning solution that (1) allows to accurately classify queries along ...
expand
On the Reusability of Open Test Collections
Seyyed Hadi Hashemi, Charles L.A. Clarke, Adriel Dean-Hall, Jaap Kamps, Julia Kiseleva
Pages: 827-830
doi>10.1145/2766462.2767788
Full text: PDFPDF

Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual ...
expand
Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels
Pages: 831-834
doi>10.1145/2766462.2767804
Full text: PDFPDF

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 ...
expand
About the 'Compromised Information Need' and Optimal Interaction as Quality Measure for Search Interfaces
Eduard C. Hoenkamp
Pages: 835-838
doi>10.1145/2766462.2767800
Full text: PDFPDF

Taylor's concept of levels of information need has been cited in over a hundred IR publications since his work was first published. It concerns the phases a searcher goes through, starting with the feeling that information seems missing, to expressing ...
expand
I See You: Person-of-Interest Search in Social Networks
Hsun-Ping Hsieh, Cheng-Te Li, Rui Yan
Pages: 839-842
doi>10.1145/2766462.2767767
Full text: PDFPDF

Searching for a particular person by specifying her name is one of the essential functions in online social networking services such as Facebook. So many times, however, one would like to find a person but what she knows is few social labels about the ...
expand
Towards Quantifying the Impact of Non-Uniform Information Access in Collaborative Information Retrieval
Nyi Nyi Htun, Martin Halvey, Lynne Baillie
Pages: 843-846
doi>10.1145/2766462.2767779
Full text: PDFPDF

The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators ...
expand
Features of Disagreement Between Retrieval Effectiveness Measures
Timothy Jones, Paul Thomas, Falk Scholer, Mark Sanderson
Pages: 847-850
doi>10.1145/2766462.2767824
Full text: PDFPDF

Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, most effectiveness measures are well correlated with each other. But, what about where they don't correlate? Which rankings cause measures to disagree? ...
expand
Subsequence Search in Event-Interval Sequences
Orestis Kostakis Kostakis, Aristides Gionis Gionis
Pages: 851-854
doi>10.1145/2766462.2767778
Full text: PDFPDF

We study the problem of subsequence search in databases of event-interval sequences, or e-sequences. In contrast to sequences of instantaneous events, e-sequences contain events that have a duration. In Information Retrieval applications, e-sequences ...
expand
Searcher in a Strange Land: Understanding Web Search from Familiar and Unfamiliar Locations
Elad Kravi, Eugene Agichtein, Ido Guy, Yaron Kanza, Avihai Mejer, Dan Pelleg
Pages: 855-858
doi>10.1145/2766462.2767782
Full text: PDFPDF

With mobile devices, web search is no longer limited to specific locations. People conduct search from practically anywhere, including at home, at work, when traveling and when on vacation. How should this influence search tools and web services? In ...
expand
Evaluating Retrieval Models through Histogram Analysis
Kriste Krstovski, David A. Smith, Michael J. Kurtz
Pages: 859-862
doi>10.1145/2766462.2767821
Full text: PDFPDF

We present a novel approach for efficiently evaluating the performance of retrieval models and introduce two evaluation metrics: Distributional Overlap (DO), which compares the clustering of scores of relevant and non-relevant documents, and Histogram ...
expand
Inter-Category Variation in Location Search
Chia-Jung Lee, Nick Craswell, Vanessa Murdock
Pages: 863-866
doi>10.1145/2766462.2767797
Full text: PDFPDF

When searching for place entities such as businesses or points of interest, the desired place may be close (finding the nearest ATM) or far away (finding a hotel in another city). Understanding the role of distance in predicting user interests can guide ...
expand
Reachability based Ranking in Interactive Image Retrieval
Jiyi Li
Pages: 867-870
doi>10.1145/2766462.2767777
Full text: PDFPDF

In some interactive image retrieval systems, users can select images from image search results and click to view their similar or related images until they reach the targets. Existing image ranking options are based on relevance, update time, interestingness ...
expand
Modeling Multi-query Retrieval Tasks Using Density Matrix Transformation
Qiuchi Li, Jingfei Li, Peng Zhang, Dawei Song
Pages: 871-874
doi>10.1145/2766462.2767819
Full text: PDFPDF

The quantum probabilistic framework has recently been applied to Information Retrieval (IR). A representative is the Quantum Language Model (QLM), which is developed for the ad-hoc retrieval with single queries and has achieved significant improvements ...
expand
Predicting User Behavior in Display Advertising via Dynamic Collective Matrix Factorization
Sheng Li, Jaya Kawale, Yun Fu
Pages: 875-878
doi>10.1145/2766462.2767781
Full text: PDFPDF

Conversion prediction and click prediction are two important and intertwined problems in display advertising, but existing approaches usually look at them in isolation. In this paper, we aim to predict the conversion response of users by jointly examining ...
expand
Zero-shot Image Tagging by Hierarchical Semantic Embedding
Xirong Li, Shuai Liao, Weiyu Lan, Xiaoyong Du, Gang Yang
Pages: 879-882
doi>10.1145/2766462.2767773
Full text: PDFPDF

Given the difficulty of acquiring labeled examples for many fine-grained visual classes, there is an increasing interest in zero-shot image tagging, aiming to tag images with novel labels that have no training examples present. Using a semantic space ...
expand
Using Term Location Information to Enhance Probabilistic Information Retrieval
Baiyan Liu, Xiangdong An, Jimmy Xiangji Huang
Pages: 883-886
doi>10.1145/2766462.2767827
Full text: PDFPDF

Nouns are more important than other parts of speech in information retrieval and are more often found near the beginning or the end of sentences. In this paper, we investigate the effects of rewarding terms based on their location in sentences on information ...
expand
Learning Context-aware Latent Representations for Context-aware Collaborative Filtering
Xin Liu, Wei Wu
Pages: 887-890
doi>10.1145/2766462.2767775
Full text: PDFPDF

In this paper, we propose a generic framework to learn context-aware latent representations for context-aware collaborative filtering. Contextual contents are combined via a function to produce the context influence factor, which is then combined with ...
expand
Exploiting User and Business Attributes for Personalized Business Recommendation
Kai Lu, Yi Zhang, Lanbo Zhang, Shuxin Wang
Pages: 891-894
doi>10.1145/2766462.2767806
Full text: PDFPDF

Data sparsity and cold-start are two major problems in personalized recommendation. They are especially severe in business recommendation, because business transactions are usually completed offline and customers generally do not provide ratings after ...
expand
Speeding up Document Ranking with Rank-based Features
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto
Pages: 895-898
doi>10.1145/2766462.2767776
Full text: PDFPDF

Learning to Rank (LtR) is an effective machine learning methodology for inducing high-quality document ranking functions. Given a query and a candidate set of documents, where query-document pairs are represented by feature vectors, a machine-learned ...
expand
Mining Measured Information from Text
Arun S. Maiya, Dale Visser, Andrew Wan
Pages: 899-902
doi>10.1145/2766462.2767789
Full text: PDFPDF

We present an approach to extract measured information from text (e.g., a $1370~^{\circ}C$ melting point, a BMI greater than 29.9 kg/m$^2$). Such extractions are critically important across a wide range of domains --- especially those involving search ...
expand
An Initial Investigation into Fixed and Adaptive Stopping Strategies
David Maxwell, Leif Azzopardi, Kalervo Järvelin, Heikki Keskustalo
Pages: 903-906
doi>10.1145/2766462.2767802
Full text: PDFPDF

Most models, measures and simulations often assume that a searcher will stop at a predetermined place in a ranked list of results. However, during the course of a search session, real-world searchers will vary and adapt their interactions with a ranked ...
expand
Regularised Cross-Modal Hashing
Sean Moran, Victor Lavrenko
Pages: 907-910
doi>10.1145/2766462.2767816
Full text: PDFPDF

In this paper we propose Regularised Cross-Modal Hashing (RCMH) a new cross-modal hashing model that projects annotation and visual feature descriptors into a common Hamming space. RCMH optimises the hashcode similarity of related data-points in the ...
expand
Adapted B-CUBED Metrics to Unbalanced Datasets
Jose G. Moreno, Gaël Dias
Pages: 911-914
doi>10.1145/2766462.2767836
Full text: PDFPDF

B-CUBED metrics have recently been adopted in the evaluation of clustering results as well as in many other related tasks. However, this family of metrics is not well adapted when datasets are unbalanced. This issue is extremely frequent in Web results, ...
expand
A Time-aware Random Walk Model for Finding Important Documents in Web Archives
Tu Ngoc Nguyen, Nattiya Kanhabua, Claudia Niederée, Xiaofei Zhu
Pages: 915-918
doi>10.1145/2766462.2767832
Full text: PDFPDF

Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a ...
expand
A Test Collection for Spoken Gujarati Queries
Douglas W. Oard, Rashmi Sankepally, Jerome White, Aren Jansen, Craig Harman
Pages: 919-922
doi>10.1145/2766462.2767791
Full text: PDFPDF

The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes ...
expand
Discovering Experts across Multiple Domains
Aditya Pal
Pages: 923-926
doi>10.1145/2766462.2767774
Full text: PDFPDF

Researchers have focused on finding experts in individual domains, such as emails, forums, question answering, blogs, and microblogs. In this paper, we propose an algorithm for finding experts across these different domains. To do this, we propose an ...
expand
Using Key Concepts in a Translation Model for Retrieval
Jae Hyun Park, W. Bruce Croft
Pages: 927-930
doi>10.1145/2766462.2767768
Full text: PDFPDF

Many queries, especially those in the form of longer questions, contain a subset of terms representing key concepts that describe the most important part of the user's information need. Detecting the key concepts in a query can be used as the basis for ...
expand
On the Cost of Phrase-Based Ranking
Matthias Petri, Alistair Moffat
Pages: 931-934
doi>10.1145/2766462.2767769
Full text: PDFPDF

Effective postings list compression techniques, and the efficiency of postings list processing schemes such as WAND, have significantly improved the practical performance of ranked document retrieval using inverted indexes. Recently, suffix array-based ...
expand
Location-Aware Model for News Events in Social Media
Mauricio Quezada, Vanessa Peña-Araya, Barbara Poblete
Pages: 935-938
doi>10.1145/2766462.2767815
Full text: PDFPDF

Nowadays, social media services are being used extensively as news sources and for spreading information on real-world events. Several studies have focused on detecting those events and locating them geographically. However, in order to study real-world ...
expand
Exploring Opportunities to Facilitate Serendipity in Search
Ataur Rahman, Max L. Wilson
Pages: 939-942
doi>10.1145/2766462.2767783
Full text: PDFPDF

Serendipitously discovering new information can bring many benefits. Although we can design systems to highlight serendipitous information, serendipity cannot be easily orchestrated and is thus hard to study. In this paper, we deployed a working search ...
expand
Combining Orthogonal Information in Large-Scale Cross-Language Information Retrieval
Shigehiko Schamoni, Stefan Riezler
Pages: 943-946
doi>10.1145/2766462.2767805
Full text: PDFPDF

System combination is an effective strategy to boost retrieval performance, especially in complex applications such as cross-language information retrieval (CLIR) where the aspects of translation and retrieval have to be optimized jointly. We focus on ...
expand
Tailoring Music Recommendations to Users by Considering Diversity, Mainstreaminess, and Novelty
Markus Schedl, David Hauger
Pages: 947-950
doi>10.1145/2766462.2767763
Full text: PDFPDF

A shortcoming of current approaches for music recommendation is that they consider user-specific characteristics only on a very simple level, typically as some kind of interaction between users and items when employing collaborative filtering. To alleviate ...
expand
Challenges of Mathematical Information Retrievalin the NTCIR-11 Math Wikipedia Task
Moritz Schubotz, Abdou Youssef, Volker Markl, Howard S. Cohl
Pages: 951-954
doi>10.1145/2766462.2767787
Full text: PDFPDF

Mathematical Information Retrieval concerns retrieving information related to a particular mathematical concept. The NTCIR-11 Math Task develops an evaluation test collection for document sections retrieval of scientific articles based on human generated ...
expand
Probabilistic Multileave for Online Retrieval Evaluation
Anne Schuth, Robert-Jan Bruintjes, Fritjof Buüttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, David Woudenberg, Maarten de Rijke
Pages: 955-958
doi>10.1145/2766462.2767838
Full text: PDFPDF

Online evaluation methods for information retrieval use implicit signals such as clicks from users to infer preferences between rankers. A highly sensitive way of inferring these preferences is through interleaved comparisons. Recently, interleaved comparisons ...
expand
Twitter Sentiment Analysis with Deep Convolutional Neural Networks
Aliaksei Severyn, Alessandro Moschitti
Pages: 959-962
doi>10.1145/2766462.2767830
Full text: PDFPDF

This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model ...
expand
Anchoring and Adjustment in Relevance Estimation
Milad Shokouhi, Ryen White, Emine Yilmaz
Pages: 963-966
doi>10.1145/2766462.2767841
Full text: PDFPDF

People's tendency to overly rely on prior information has been well studied in psychology in the context of anchoring and adjustment. Anchoring biases pervade many aspects of human behavior. In this paper, we present a study of anchoring bias in information ...
expand
Cognitive Activity during Web Search
Md. Hedayetul Islam Shovon, D (Nanda) Nandagopal, Jia Tina Du, Ramasamy Vijayalakshmi, Bernadine Cocks
Pages: 967-970
doi>10.1145/2766462.2767784
Full text: PDFPDF

Searching on the Web or Net-surfing is a part of everyday life for many people, but little is known about the brain activity during Web searching. Such knowledge is essential for better understanding of the cognitive demands imposed by the search system ...
expand
Personalized Semantic Ranking for Collaborative Recommendation
Song Xu, Shu Wu, Liang Wang
Pages: 971-974
doi>10.1145/2766462.2767772
Full text: PDFPDF

Recently a ranking view of collaborative recommendation has received much attention in recommendation systems. Most of existing ranking approaches are based on pairwise assumption, i.e., everything that has not been selected is of less interest for a ...
expand
Active Learning for Entity Filtering in Microblog Streams
Damiano Spina, Maria-Hendrike Peetz, Maarten de Rijke
Pages: 975-978
doi>10.1145/2766462.2767839
Full text: PDFPDF

Monitoring the reputation of entities such as companies or brands in microblog streams (e.g., Twitter) starts by selecting mentions that are related to the entity of interest. Entities are often ambiguous (e.g., "Jaguar" or "Ford") and effective methods ...
expand
Relevance-aware Filtering of Tuples Sorted by an Attribute Value via Direct Optimization of Search Quality Metrics
Nikita V. Spirin, Mikhail Kuznetsov, Julia Kiseleva, Yaroslav V. Spirin, Pavel A. Izhutov
Pages: 979-982
doi>10.1145/2766462.2767822
Full text: PDFPDF

Sorting tuples by an attribute value is a common search scenario and many search engines support such capabilities, e.g. price-based sorting in e-commerce, time-based sorting on a job or social media website. However, sorting purely by the attribute ...
expand
Multi-source Information Fusion for Personalized Restaurant Recommendation
Jing Sun, Yun Xiong, Yangyong Zhu, Junming Liu, Chu Guan, Hui Xiong
Pages: 983-986
doi>10.1145/2766462.2767818
Full text: PDFPDF

In this paper, we study the problem of personalized restaurant recommendations. Specifically, we develop a probabilistic factor analysis framework, named RMSQ-MF, which has the ability in exploiting multi-source information, such as the users' task, ...
expand
Joint Matrix Factorization and Manifold-Ranking for Topic-Focused Multi-Document Summarization
Jiwei Tan, Xiaojun Wan, Jianguo Xiao
Pages: 987-990
doi>10.1145/2766462.2767765
Full text: PDFPDF

Manifold-ranking has proved to be an effective method for topic-focused multi-document summarization. As basic manifold-ranking based summarization method constructs the relationships between sentences simply by the bag-of-words cosine similarity, we ...
expand
Towards Understanding the Impact of Length in Web Search Result Summaries over a Speech-only Communication Channel
Johanne R. Trippas, Damiano Spina, Mark Sanderson, Lawrence Cavedon
Pages: 991-994
doi>10.1145/2766462.2767826
Full text: PDFPDF

Presenting search results over a speech-only communication channel involves a number of challenges for users due to cognitive limitations and the serial nature of speech. We investigated the impact of search result summary length in speech-based web ...
expand
Early Detection of Topical Expertise in Community Question Answering
David van Dijk, Manos Tsagkias, Maarten de Rijke
Pages: 995-998
doi>10.1145/2766462.2767840
Full text: PDFPDF

We focus on detecting potential topical experts in community question answering platforms early on in their lifecycle. We use a semi-supervised machine learning approach. We extract three types of feature: (i) textual, (ii) behavioral, and (iii) time-aware, ...
expand
LBMCH: Learning Bridging Mapping for Cross-modal Hashing
Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang
Pages: 999-1002
doi>10.1145/2766462.2767825
Full text: PDFPDF

Hashing has gained considerable attention on large-scale similarity search, due to its enjoyable efficiency and low storage cost. In this paper, we study the problem of learning hash functions in the context of multi-modal data for cross-modal similarity ...
expand
Gibberish, Assistant, or Master?: Using Tweets Linking to News for Extractive Single-Document Summarization
Zhongyu Wei, Wei Gao
Pages: 1003-1006
doi>10.1145/2766462.2767835
Full text: PDFPDF

Single-document summarization is a challenging task. In this paper, we explore effective ways using the tweets linking to news for generating extractive summary of each document. We reveal the very basic value of tweets that can be utilized by regarding ...
expand
Context-aware Point-of-Interest Recommendation Using Tensor Factorization with Social Regularization
Lina Yao, Quan Z. Sheng, Yongrui Qin, Xianzhi Wang, Ali Shemshadi, Qi He
Pages: 1007-1010
doi>10.1145/2766462.2767794
Full text: PDFPDF

Point-of-Interest (POI) recommendation is a new type of recommendation task that comes along with the prevalence of location-based social networks in recent years. Compared with traditional tasks, it focuses more on personalized, context-aware recommendation ...
expand
Adaptive User Engagement Evaluation via Multi-task Learning
Hamed Zamani, Pooya Moradi, Azadeh Shakery
Pages: 1011-1014
doi>10.1145/2766462.2767785
Full text: PDFPDF

User engagement evaluation task in social networks has recently attracted considerable attention due to its applications in recommender systems. In this task, the posts containing users' opinions about items, e.g., the tweets containing the users' ratings ...
expand
Compact Snippet Caching for Flash-based Search Engines
Rui Zhang, Pengyu Sun, Jiancong Tong, Rebecca Jane Stones, Gang Wang, Xiaoguang Liu
Pages: 1015-1018
doi>10.1145/2766462.2767764
Full text: PDFPDF

In response to a user query, search engines return the top-k relevant results, each of which contains a small piece of text, called a snippet, extracted from the corresponding document. Obtaining a snippet is time consuming as it requires both document ...
expand
When Personalization Meets Conformity: Collective Similarity based Multi-Domain Recommendation
Xi Zhang, Jian Cheng, Shuang Qiu, Zhenfeng Zhu, Hanqing Lu
Pages: 1019-1022
doi>10.1145/2766462.2767810
Full text: PDFPDF

Existing recommender systems place emphasis on personalization to achieve promising accuracy. However, in the context of multiple domain, users are likely to seek the same behaviors as domain authorities. This conformity effect provides a wealth of prior ...
expand
Sub-document Timestamping of Web Documents
Yue Zhao, Claudia Hauff
Pages: 1023-1026
doi>10.1145/2766462.2767803
Full text: PDFPDF

Knowledge about a (Web) document's creation time has been shown to be an important factor in various temporal information retrieval settings. Commonly, it is assumed that such documents were created at a single point in time. While this assumption may ...
expand
DEMONSTRATION SESSION: Demonstrations
DINFRA: A One Stop Shop for Computing Multilingual Semantic Relatedness
Siamak Barzegar, Juliano Efson Sales, Andre Freitas, Siegfried Handschuh, Brian Davis
Pages: 1027-1028
doi>10.1145/2766462.2767870
Full text: PDFPDF

This demonstration presents an infrastructure for computing multilingual semantic relatedness and correlation for twelve natural languages by using three distributional semantic models (DSMs). Our demonsrator - DInfra (Distributional Infrastructure) ...
expand
VenueMusic: A Venue-Aware Music Recommender System
Zhiyong Cheng, Jialie Shen
Pages: 1029-1030
doi>10.1145/2766462.2767869
Full text: PDFPDF

Users' music preferences can be greatly influenced by their location and environment nearby. In this demonstration, we present an intelligent music recommender system, called VenueMusic, to automatically identify suitable music for various popular venues ...
expand
Shiny on Your Crazy Diagonal
Giorgio Maria Di Nunzio
Pages: 1031-1032
doi>10.1145/2766462.2767867
Full text: PDFPDF

In this demo, we present a web application which allows users to interact with two retrieval models, namely the Binary Independence Model (BIM) and the BM25 model, on a standard TREC collection. The goal of this demo is to give students deeper insight ...
expand
CricketLinking: Linking Event Mentions from Cricket Match Reports to Ball Entities in Commentaries
Manish Gupta
Pages: 1033-1034
doi>10.1145/2766462.2767865
Full text: PDFPDF

The 2011 Cricket World Cup final match was watched by around 135 million people. Such a huge viewership demands a great experience for users of online cricket portals. Many portals like espncricinfo.com host a variety of content related to recent matches ...
expand
An Aspect-driven Social Media Explorer
Nedim Lipka, W. Bruce Croft
Pages: 1035-1036
doi>10.1145/2766462.2767864
Full text: PDFPDF

We demonstrate an exploration tool that organizes social media content under diverse aspects enabling comprehensive explorations. Unlike existing approaches that group content by trending topics, we present a holistic view of diverse and relevant content ...
expand
ERICA: Expert Guidance in Validating Crowd Answers
Nguyen Quoc Viet Hung, Duong Chi Thang, Matthias Weidlich, Karl Aberer
Pages: 1037-1038
doi>10.1145/2766462.2767866
Full text: PDFPDF

Crowdsourcing became an essential tool for a broad range of Web applications. Yet, the wide-ranging levels of expertise of crowd workers as well as the presence of faulty workers call for quality control of the crowdsourcing result. To this end, many ...
expand
Large-scale Image Retrieval using Neural Net Descriptors
David Novak, Michal Batko, Pavel Zezula
Pages: 1039-1040
doi>10.1145/2766462.2767868
Full text: PDFPDF
Galean: Visualization of Geolocated News Events from Social Media
Vanessa Peña-Araya, Mauricio Quezada, Barbara Poblete
Pages: 1041-1042
doi>10.1145/2766462.2767862
Full text: PDFPDF

Online Social Networks (OSN) have changed the way information is produced and consumed. Organizing and retrieving unstructured data extracted from these platforms is not an easy task. Galean is a visual and interactive tool that aims to help journalists ...
expand
SciNet: Interactive Intent Modeling for Information Discovery
Tuukka Ruotsalo, Jaakko Peltonen, Manuel J.A. Eugster, Dorota Głowacka, Aki Reijonen, Giulio Jacucci, Petri Myllymäki, Samuel Kaski
Pages: 1043-1044
doi>10.1145/2766462.2767863
Full text: PDFPDF

Current search engines offer limited assistance for exploration and information discovery in complex search tasks. Instead, users are distracted by the need to focus their cognitive efforts on finding navigation cues, rather than selecting relevant information. ...
expand
Linse: A Distributional Semantics Entity Search Engine
Juliano Efson Sales, André Freitas, Siegfried Handschuh, Brian Davis
Pages: 1045-1046
doi>10.1145/2766462.2767871
Full text: PDFPDF

Entering 'Football Players from United States' when searching for 'American Footballers' is an example of vocabulary mismatch, which occurs when different words are used to express the same concepts. In order to address this phenomenon for entity search ...
expand
Online News Tracking for Ad-Hoc Queries
Jeroen B.P. Vuurens, Arjen P. de Vries, Roi Blanco, Peter Mika
Pages: 1047-1048
doi>10.1145/2766462.2767872
Full text: PDFPDF

Following news about a specific event can be a difficult task as new information is often scattered across web pages. An up-to-date summary of the event would help to inform users and allow them to navigate to articles that are likely to contain relevant ...
expand
DUMPLING: A Novel Dynamic Search Engine
Andrew Jie Zhou, Jiyun Luo, Hui Yang
Pages: 1049-1050
doi>10.1145/2766462.2767873
Full text: PDFPDF

In this demo paper, we introduce a new search engine that supports Information Retrieval (IR) in a dynamic setting. A dynamic search engine distinguishes itself by handling rich interactions and temporal dependency among the queries in a session or for ...
expand
SESSION: Doctoral Consortium
Promoting User Engagement and Learning in Amorphous Search Tasks
Piyush Arora
Pages: 1051-1051
doi>10.1145/2766462.2767848
Full text: PDFPDF

Much research in information retrieval (IR) focuses on optimization of the rank of relevant retrieval results for single shot ad hoc IR tasks. Relatively little research has been carried out on user engagement to support more complex search tasks. We ...
expand
Cross-Platform Question Routing for Better Question Answering
Mossaab Bagdouri
Pages: 1053-1053
doi>10.1145/2766462.2767849
Full text: PDFPDF

The last two decades have seen an increasing interest in the task of question answering (QA). Earlier approaches focused on automated retrieval and extraction models. Recent developments have more focus on community driven QA. This work addresses this ...
expand
Time Pressure in Information Search
Anita Crescenzi
Pages: 1055-1055
doi>10.1145/2766462.2767851
Full text: PDFPDF

The primary purpose of this research is to explore the impact of perceived time pressure on search behaviors, searcher perceptions of the search system and the search experience. Are there observable behavioral changes when a searcher is time-pressured? ...
expand
Controversy Detection and Stance Analysis
Shiri Dori-Hacohen
Pages: 1057-1057
doi>10.1145/2766462.2767844
Full text: PDFPDF

Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the "filter bubble" effect. Additionally, presenting information to the user about the different stances or sides of the ...
expand
Using Contextual Information to Understand Searching and Browsing Behavior
Julia Kiseleva
Pages: 1059-1059
doi>10.1145/2766462.2767852
Full text: PDFPDF

There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage ...
expand
Transfer Learning for Information Retrieval
Pengfei Li
Pages: 1061-1061
doi>10.1145/2766462.2767845
Full text: PDFPDF
Enhancing Mathematics Information Retrieval
Martin Líška
Pages: 1063-1063
doi>10.1145/2766462.2767843
Full text: PDFPDF
Improving Search using Proximity-Based Statistics
Xiaolu Lu
Pages: 1065-1065
doi>10.1145/2766462.2767847
Full text: PDFPDF
Spoken Conversational Search: Information Retrieval over a Speech-only Communication Channel
Johanne R. Trippas
Pages: 1067-1067
doi>10.1145/2766462.2767850
Full text: PDFPDF
Finding Answers in Web Search
Evi Yulianti
Pages: 1069-1069
doi>10.1145/2766462.2767846
Full text: PDFPDF

There are many informational queries that could be answered with a text passage, thereby not requiring the searcher to access the full web document. When building manual annotations of answer passages for TREC queries, Keikha et al. [6] confirmed that ...
expand
SESSION: Industry Track Preface
Hang Li, Jaime Teevan
Full text: PDFPDF

It is our great pleasure to welcome you to the SIGIR Symposium on Information Retrieval in Practice (SIRIP 2015). The goal of SIRIP is to bring together information retrieval researchers, practitioners, analysts, and consumers, and to achieve ...
expand
SESSION: Industry Track Invited Talks
From Web Search Relevance to Vertical Search Relevance
Yi Chang
Pages: 1073-1073
doi>10.1145/2766462.2776787
Full text: PDFPDF

Web search relevance is a billion dollar challenge, while there is a disadvantage of backwardness in web search competition. Vertical search result can be incorporated to enrich web search content, therefore vertical search relevance is critical to provide ...
expand
Finding Money in the Haystack: Information Retrieval at Bloomberg
Jonathan J. Dorando, Konstantine Arkoudas, Parth Vasa, Gary Kazantsev, Gideon Mann
Pages: 1075-1075
doi>10.1145/2766462.2776782
Full text: PDFPDF

The financial markets are a rich domain for search, and it is not simple to serving the entire scope of financial professionals, who make their living on accurate, timely, and deep information. The data sources are many and disparate. This includes domains ...
expand
If SIGIR had an Academic Track, What Would Be In It?
David Hawking
Pages: 1077-1077
doi>10.1145/2766462.2776784
Full text: PDFPDF

It used to be the case that very little industry research was presented at SIGIR. Now the balance has radically changed -- many accepted papers have industry authors and many rely on industry data sets -- To the extent that a leading academic member ...
expand
WeChat Search & Headline: Sogou Joins Force with Tencent on Mobile Search
Chao Liu
Pages: 1079-1079
doi>10.1145/2766462.2776781
Full text: PDFPDF

Tencent Inc. is the biggest social network company in China. Its WeChat and QQ boast of 700 million and 800 million monthly active users (MAU), respectively. Sogou Inc., on the other hand, is a search leader in China, being the No. 2 and No. 3 on mobile/PC ...
expand
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Asif Makhani
Pages: 1081-1081
doi>10.1145/2766462.2776785
Full text: PDFPDF

All of us are familiar with search as users. And as software engineers, many of us have worked on search problems in the context of web search, site search, or enterprise search. But search at LinkedIn is different. Our corpus is a richly structured ...
expand
Location in Search
Vanessa Murdock
Pages: 1083-1083
doi>10.1145/2766462.2776783
Full text: PDFPDF

As users turn increasingly to handheld devices to find information, the research community has focused on real-time location signals (GPS signals) to improve search engine effectiveness. Location signals have been investigated for predicting businesses ...
expand
Challenges and Opportunities in Online Evaluation of Search Engines
Pavel Serdyukov
Pages: 1085-1085
doi>10.1145/2766462.2776786
Full text: PDFPDF

Yandex is one of the largest Internet companies in Europe, operating Russia's most popular search engine, generating 58.6\% of all search traffic in Russia (as of April 2015). As all modern search engines, Yandex increasingly relies on online evaluation ...
expand
Lower Search Cost
Dou Shen
Pages: 1087-1087
doi>10.1145/2766462.2776788
Full text: PDFPDF

Web search is actually a pretty heavy task for most users since people need to launch a search engine's portal, phrase the right query and then go through search results to find the right information or service. To lower the search cost, commercial search ...
expand
SESSION: Industry Track Refereed Papers
Practical Lessons for Gathering Quality Labels at Scale
Omar Alonso
Pages: 1089-1092
doi>10.1145/2766462.2776778
Full text: PDFPDF

Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: ...
expand
Incremental Sampling of Query Logs
Ricardo Baeza-Yates
Pages: 1093-1096
doi>10.1145/2766462.2776780
Full text: PDFPDF

We introduce a simple technique to generate incremental query log samples that mimics well the original query distribution. In this way, editorial judgments for new queries can be consistently added to previous judgments. We also review the problem of ...
expand
Where to Go on Your Next Trip?: Optimizing Travel Destinations Based on User Preferences
Julia Kiseleva, Melanie J.I. Mueller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Jaap Kamps, Alexander Tuzhilin, Djoerd Hiemstra
Pages: 1097-1100
doi>10.1145/2766462.2776777
Full text: PDFPDF

Recommendation based on user preferences is a common task for e-commerce websites. New recommendation algorithms are often evaluated by offline comparison to baseline algorithms such as recommending random or the most popular items. Here, we investigate ...
expand
Bringing Order to the Job Market: Efficient Job Offer Categorization in E-Recruitment
Emmanuel Malherbe, Mario Cataldi, Andrea Ballatore
Pages: 1101-1104
doi>10.1145/2766462.2776779
Full text: PDFPDF

E-recruitment uses a range of web-based technologies to find, evaluate, and hire new personnel for organizations. A crucial challenge in this arena lies in the categorization of job offers: candidates and operators often explore and analyze large numbers ...
expand
TUTORIAL SESSION: Tutorials
Yoelle Maarek
Full text: PDFPDF

This year's conference received twelve submissions, of which eight were accepted, and one was extended to an additional half-day. The decision was based on criteria of relevance to the SIGIR community, core quality and experience of presenters. The accepted ...
expand
Building and Using Models of Information Seeking, Search and Retrieval: Full Day Tutorial
Leif Azzopardi, Guido Zuccon
Pages: 1107-1110
doi>10.1145/2766462.2767874
Full text: PDFPDF

Understanding how people interact with information systems when searching is central to the study of Interactive Information Retrieval (IIR). While much of the prior work in this area has either been conceptual, observational or empirical, recently there ...
expand
Advanced Click Models and their Applications to IR: SIGIR 2015 Tutorial
Aleksandr Chuklin, Ilya Markov, Maarten de Rijke
Pages: 1111-1112
doi>10.1145/2766462.2767882
Full text: PDFPDF

This tutorial concerns with more advanced and more recent topics in the area of click models. Here, we discuss recent developments in the area with a particular focus on applications of click models. The tutorial features a guest talk and a live demo ...
expand
An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial
Aleksandr Chuklin, Ilya Markov, Maarten de Rijke
Pages: 1113-1115
doi>10.1145/2766462.2767881
Full text: PDFPDF

In this introductory tutorial we give an overview of click models for web search. We show how the framework of probabilistic graphical models help to explain user behavior, build new evaluation metrics and perform simulations. The tutorial is augmented ...
expand
IR Evaluation: Modeling User Behavior for Measuring Effectiveness
Charles L.A. Clarke, Mark D. Smucker, Emine Yilmaz
Pages: 1117-1120
doi>10.1145/2766462.2767876
Full text: PDFPDF

This half-day tutorial on IR evaluation combines an introduction to classical IR evaluation methods with material on more recent user-oriented approaches. We primarily focus on off-line evaluation, but some material on on-line evaluation is also covered. ...
expand
Information Retrieval with Verbose Queries
Manish Gupta, Michael Bendersky
Pages: 1121-1124
doi>10.1145/2766462.2767877
Full text: PDFPDF

Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like ...
expand
Revisiting the Foundations of IR: Timeless, Yet Timely
Paul B. Kantor
Pages: 1125-1127
doi>10.1145/2766462.2767878
Full text: PDFPDF

As we face an explosion of potential new applications for the fundamental concepts and technologies of information retrieval, ranging from ad ranking to social media, from collaborative recommending to question answering systems, many researchers are ...
expand
IR Evaluation: Designing an End-to-End Offline Evaluation Pipeline
Jin Young Kim, Emine Yilmaz
Pages: 1129-1132
doi>10.1145/2766462.2767875
Full text: PDFPDF

This tutorial aims to provide attendees with a detailed understanding of end-to-end evaluation pipeline based on human judgments (offline measurement). The tutorial will give an overview of the state of the art methods, techniques, and metrics necessary ...
expand
Music Retrieval and Recommendation: A Tutorial Overview
Peter Knees, Markus Schedl
Pages: 1133-1136
doi>10.1145/2766462.2767880
Full text: PDFPDF

In this tutorial, we give an introduction to the field of and state of the art in music information retrieval (MIR). The tutorial particularly spotlights the question of music similarity, which is an essential aspect in music retrieval and recommendation. ...
expand
Exploiting Wikipedia for Information Retrieval Tasks
Bracha Shapira, Nir Ofek, Victor Makarenkov
Pages: 1137-1140
doi>10.1145/2766462.2767879
Full text: PDFPDF

Wikipedia - the online encyclopedia - has long been used as a source of information for researchers, as well as being a subject of research itself. Wikipedia has been shown to be effective in recommender systems, sentiment analysis, validation and multiple ...
expand
WORKSHOP SESSION: Workshops
Fernando Diaz, Diane Kelly
Full text: PDFPDF

We are pleased to introduce the Workshop Program for the 38th Annual SIGIR Conference. We received 14 workshop proposals, each of which was peer-reviewed by three members of the Workshops PC. After discussion of all submissions in the Workshops PC, as ...
expand
Web Question Answering: Beyond Factoids: SIGIR 2015 Workshop
Eugene Agichtein, David Carmel, Charles L.A. Clarke, Praveen Paritosh, Dan Pelleg, Idan Szpektor
Pages: 1143-1143
doi>10.1145/2766462.2767861
Full text: PDFPDF
Graph Search and Beyond: SIGIR 2015 Workshop Summary
Omar Alonso, Marti A. Hearst, Jaap Kamps
Pages: 1145-1146
doi>10.1145/2766462.2767855
Full text: PDFPDF

Modern Web data is highly structured in terms of entities and relations from large knowledge resources, geo-temporal references and social network structure, resulting in a massive multidimensional graph. This graph essentially unifies both the searcher ...
expand
SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR)
Jaime Arguello, Fernando Diaz, Jimmy Lin, Andrew Trotman
Pages: 1147-1148
doi>10.1145/2766462.2767858
Full text: PDFPDF
SIGIR 2015 Workshop on Temporal, Social and Spatially-aware Information Access (#TAIA2015)
Klaus Berberich, James Caverlee, Miles Efron, Claudia Hauff, Vanessa Murdock, Milad Shokouhi, Bart Thomee
Pages: 1149-1150
doi>10.1145/2766462.2767860
Full text: PDFPDF

In this workshop we aim to bring together practitioners and researchers to discuss their recent breakthroughs and the challenges with addressing spatial and temporal information access, both from the algorithmic and the architectural perspectives.
expand
NeuroIR 2015: Neuro-Physiological Methods in IR Research
Jacek Gwizdka, Joemon Jose, Javed Mostafa, Max Wilson
Pages: 1151-1153
doi>10.1145/2766462.2767856
Full text: PDFPDF

This Tutorial+Workshop will discuss opportunities and challenges involved in using neuro-physiological tools/techniques (such as fMRI, fNIRS, EEG, eye-tracking, GSR, HR, and facial expressions) and theories in information retrieval. The hybrid format ...
expand
SPS'15: 2015 International Workshop on Social Personalization & Search
Christoph Trattner, Denis Parra, Peter Brusilovsky, Leandro Marinho
Pages: 1155-1155
doi>10.1145/2766462.2767859
Full text: PDFPDF
Privacy-Preserving IR 2015: When Information Retrieval Meets Privacy and Security
Hui Yang, Ian Soboroff
Pages: 1157-1158
doi>10.1145/2766462.2767857
Full text: PDFPDF

Information retrieval (IR) and information privacy/security are two fast-growing computer science disciplines. There are many synergies and connections between these two disciplines. However, there have been very limited efforts to connect the two important ...
expand

Powered by The ACM Guide to Computing Literature


Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
Table of Contents
SESSION: Athena award lecture
Marti A. Hearst
Putting searchers into search
Susan T. Dumais
Pages: 1-2
doi>10.1145/2600428.2617557
Full text: PDFPDF

Over the last two decades the information retrieval landscape has changed dramatically. Twenty years ago, there were fewer than 3k web sites and the earliest web search engines indexed approximately 50k pages. Today, search engines index billions of ...
expand
SESSION: Session 1a: risks and rewards
Diane Kelly
Modelling interaction with economic models of search
Leif Azzopardi
Pages: 3-12
doi>10.1145/2600428.2609574
Full text: PDFPDF

Understanding how people interact when searching is central to the study of Interactive Information Retrieval (IIR). Most of the prior work has either been conceptual, observational or empirical. While this has led to numerous insights and findings regarding ...
expand
Query-performance prediction: setting the expectations straight
Fiana Raiber, Oren Kurland
Pages: 13-22
doi>10.1145/2600428.2609581
Full text: PDFPDF

The query-performance prediction task has been described as estimating retrieval effectiveness in the absence of relevance judgments. The expectations throughout the years were that improved prediction techniques would translate to improved retrieval ...
expand
Hypothesis testing for the risk-sensitive evaluation of retrieval systems
B. Taner Dinçer, Craig Macdonald, Iadh Ounis
Pages: 23-32
doi>10.1145/2600428.2609625
Full text: PDFPDF

The aim of risk-sensitive evaluation is to measure when a given information retrieval (IR) system does not perform worse than a corresponding baseline system for any topic. This paper argues that risk-sensitive evaluation is akin to the underlying methodology ...
expand
SESSION: Session 1b: #microblog #sigir2014
Hang Li
Temporal feedback for tweet search with non-parametric density estimation
Miles Efron, Jimmy Lin, Jiyin He, Arjen de Vries
Pages: 33-42
doi>10.1145/2600428.2609575
Full text: PDFPDF

This paper investigates the temporal cluster hypothesis: in search tasks where time plays an important role, do relevant documents tend to cluster together in time? We explore this question in the context of tweet search and temporal feedback: starting ...
expand
Fine-grained location extraction from tweets with temporal awareness
Chenliang Li, Aixin Sun
Pages: 43-52
doi>10.1145/2600428.2609582
Full text: PDFPDF

Twitter is a popular platform for sharing activities, plans, and opinions. Through tweets, users often reveal their location information and short term visiting plans. In this paper, we are interested in extracting fine-grained locations mentioned in ...
expand
Collaborative personalized Twitter search with topic-language models
Jan Vosecky, Kenneth Wai-Ting Leung, Wilfred Ng
Pages: 53-62
doi>10.1145/2600428.2609584
Full text: PDFPDF

The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user's search query, delivering content that is relevant to her interests is a challenging problem. Traditional ...
expand
SESSION: Session 1c: recommendation
Jamie Callan
Gaussian process factorization machines for context-aware recommendations
Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas
Pages: 63-72
doi>10.1145/2600428.2609623
Full text: PDFPDF

Context-aware recommendation (CAR) can lead to significant improvements in the relevance of the recommended items by modeling the nuanced ways in which context influences preferences. The dominant approach in context-aware recommendation has been the ...
expand
Addressing cold start in recommender systems: a semi-supervised co-training algorithm
Mi Zhang, Jie Tang, Xuchen Zhang, Xiangyang Xue
Pages: 73-82
doi>10.1145/2600428.2609599
Full text: PDFPDF

Cold start is one of the most challenging problems in recommender systems. In this paper we tackle the cold-start problem by proposing a context-aware semi-supervised co-training method named CSEL. Specifically, we use a factorization model to capture ...
expand
Explicit factor models for explainable recommendation based on phrase-level sentiment analysis
Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, Shaoping Ma
Pages: 83-92
doi>10.1145/2600428.2609579
Full text: PDFPDF

Collaborative Filtering(CF)-based recommendation algorithms, such as Latent Factor Models (LFM), work well in terms of prediction accuracy. However, the latent features make it difficulty to explain the recommendation results to the users. Fortunately, ...
expand
SESSION: Session 2a: (i can't get no) satisfaction
Justin Zobel
Context-aware web search abandonment prediction
Yang Song, Xiaolin Shi, Ryen White, Ahmed Hassan Awadallah
Pages: 93-102
doi>10.1145/2600428.2609604
Full text: PDFPDF

Web search queries without hyperlink clicks are often referred to as abandoned queries. Understanding the reasons for abandonment is crucial for search engines in evaluating their performance. Abandonment can be categorized as good or bad depending on ...
expand
Impact of response latency on user behavior in web search
Ioannis Arapakis, Xiao Bai, B. Barla Cambazoglu
Pages: 103-112
doi>10.1145/2600428.2609627
Full text: PDFPDF

Traditionally, the efficiency and effectiveness of search systems have both been of great interest to the information retrieval community. However, an in-depth analysis on the interplay between the response latency of web search systems and users' search ...
expand
Towards better measurement of attention and satisfaction in mobile search
Dmitry Lagun, Chih-Hung Hsieh, Dale Webster, Vidhya Navalpakkam
Pages: 113-122
doi>10.1145/2600428.2609631
Full text: PDFPDF

Web Search has seen two big changes recently: rapid growth in mobile search traffic, and an increasing trend towards providing answer-like results for relatively simple information needs (e.g., [weather today]). Such results display the answer or relevant ...
expand
Modeling action-level satisfaction for search task satisfaction prediction
Hongning Wang, Yang Song, Ming-Wei Chang, Xiaodong He, Ahmed Hassan, Ryen W. White
Pages: 123-132
doi>10.1145/2600428.2609607
Full text: PDFPDF

Search satisfaction is a property of a user's search process. Understanding it is critical for search providers to evaluate the performance and improve the effectiveness of search engines. Existing methods model search satisfaction holistically at the ...
expand
SESSION: Session 2b: doctors and lawyers
Leif Azzopardi
Circumlocution in diagnostic medical queries
Isabelle Stanton, Samuel Ieong, Nina Mishra
Pages: 133-142
doi>10.1145/2600428.2609589
Full text: PDFPDF

Circumlocution is when many words are used to describe what could be said with fewer, e.g., "a machine that takes moisture out of the air" instead of "dehumidifier." Web search is a perfect backdrop for circumlocution where people struggle to name what ...
expand
Interactions between health searchers and search engines
Georg P. Schoenherr, Ryen W. White
Pages: 143-152
doi>10.1145/2600428.2609602
Full text: PDFPDF

The Web is an important resource for understanding and diagnosing medical conditions. Based on exposure to online content, people may develop undue health concerns, believ- ing that common and benign symptoms are explained by se- rious illnesses. In ...
expand
Evaluation of machine-learning protocols for technology-assisted review in electronic discovery
Gordon V. Cormack, Maura R. Grossman
Pages: 153-162
doi>10.1145/2600428.2609601
Full text: PDFPDF

Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our ...
expand
ReQ-ReC: high recall retrieval with query pooling and interactive classification
Cheng Li, Yue Wang, Paul Resnick, Qiaozhu Mei
Pages: 163-172
doi>10.1145/2600428.2609618
Full text: PDFPDF

We consider a scenario where a searcher requires both high precision and high recall from an interactive retrieval process. Such scenarios are very common in real life, exemplified by medical search, legal search, market research, and literature review. ...
expand
SESSION: Session 2c: hashing and efficiency
Dawei Song
Supervised hashing with latent factor models
Peichao Zhang, Wei Zhang, Wu-Jun Li, Minyi Guo
Pages: 173-182
doi>10.1145/2600428.2609600
Full text: PDFPDF

Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in large-scale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) ...
expand
Preference preserving hashing for efficient recommendation
Zhiwei Zhang, Qifan Wang, Lingyun Ruan, Luo Si
Pages: 183-192
doi>10.1145/2600428.2609578
Full text: PDFPDF

Recommender systems usually need to compare a large number of items before users' most preferred ones can be found This process can be very costly if recommendations are frequently made on large scale datasets. In this paper, a novel hashing algorithm, ...
expand
Load balancing for partition-based similarity search
Xun Tang, Maha Alabduljalil, Xin Jin, Tao Yang
Pages: 193-202
doi>10.1145/2600428.2609624
Full text: PDFPDF

All pairs similarity search, used in many data mining and information retrieval applications, is a time consuming process. Although a partition-based approach accelerates this process by simplifying parallelism management and avoiding unnecessary I/O ...
expand
Estimating global statistics for unstructured P2P search in the presence of adversarial peers
Sami Richardson, Ingemar J. Cox
Pages: 203-212
doi>10.1145/2600428.2609567
Full text: PDFPDF

A common problem in unstructured peer-to-peer (P2P) information retrieval is the need to compute global statistics of the full collection, when only a small subset of the collection is visible to a peer. Without accurate estimates of these statistics, ...
expand
SESSION: Session 3a: Social media
Hui Fang
Hierarchical multi-label classification of social text streams
Zhaochun Ren, Maria-Hendrike Peetz, Shangsong Liang, Willemijn van Dolen, Maarten de Rijke
Pages: 213-222
doi>10.1145/2600428.2609595
Full text: PDFPDF

Hierarchical multi-label classification assigns a document to multiple hierarchical classes. In this paper we focus on hierarchical multi-label classification of social text streams. Concept drift, complicated relations among classes, and the limited ...
expand
An adaptive teleportation random walk model for learning social tag relevance
Xiaofei Zhu, Wolfgang Nejdl, Mihai Georgescu
Pages: 223-232
doi>10.1145/2600428.2609556
Full text: PDFPDF

Social tags are known to be a valuable source of information for image retrieval and organization. However, contrary to the conventional document retrieval, rich tag frequency information in social sharing systems, such as Flickr, is not available, thus ...
expand
Predicting the popularity of web 2.0 items based on user comments
Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, Kazunari Sugiyama
Pages: 233-242
doi>10.1145/2600428.2609558
Full text: PDFPDF

In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for ...
expand
Recommending social media content to community owners
Inbal Ronen, Ido Guy, Elad Kravi, Maya Barnea
Pages: 243-252
doi>10.1145/2600428.2609596
Full text: PDFPDF

Online communities within the enterprise offer their leaders an easy and accessible way to attract, engage, and influence others. Our research studies the recommendation of social media content to leaders (owners) of online communities within the enterprise. ...
expand
SESSION: Session 3b: indexing and efficiency
Alistair Moffat
Predictive parallelization: taming tail latencies in web search
Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner
Pages: 253-262
doi>10.1145/2600428.2609572
Full text: PDFPDF

Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running ...
expand
Skewed partial bitvectors for list intersection
Andrew Kane, Frank Wm. Tompa
Pages: 263-272
doi>10.1145/2600428.2609609
Full text: PDFPDF

This paper examines the space-time performance of in-memory conjunctive list intersection algorithms, as used in search engines, where integers represent document identifiers. We demonstrate that the combination of bitvectors, large skips, delta compressed ...
expand
Partitioned Elias-Fano indexes
Giuseppe Ottaviano, Rossano Venturini
Pages: 273-282
doi>10.1145/2600428.2609615
Full text: PDFPDF

The Elias-Fano representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is ...
expand
Principled dictionary pruning for low-memory corpus compression
Jiancong Tong, Anthony Wirth, Justin Zobel
Pages: 283-292
doi>10.1145/2600428.2609576
Full text: PDFPDF

Compression of collections, such as text databases, can both reduce space consumption and increase retrieval efficiency, through better caching and better exploitation of the memory hierarchy. A promising technique is relative Lempel-Ziv coding, in which ...
expand
SESSION: Session 3c: e pluribus unum
Bruce Croft
Learning for search result diversification
Yadong Zhu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Shuzi Niu
Pages: 293-302
doi>10.1145/2600428.2609634
Full text: PDFPDF

Search result diversification has gained attention as a way to tackle the ambiguous or multi-faceted information needs of users. Most existing methods on this problem utilize a heuristic predefined ranking function, where limited features can be incorporated ...
expand
Fusion helps diversification
Shangsong Liang, Zhaochun Ren, Maarten de Rijke
Pages: 303-312
doi>10.1145/2600428.2609561
Full text: PDFPDF

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis ...
expand
Utilizing relevance feedback in fusion-based retrieval
Ella Rabinovich, Ofri Rom, Oren Kurland
Pages: 313-322
doi>10.1145/2600428.2609573
Full text: PDFPDF

Work on using relevance feedback for retrieval has focused on the single retrieved list setting. That is, an initial document list is retrieved in response to the query and feedback for the most highly ranked documents is used to perform a second search. ...
expand
A simple term frequency transformation model for effective pseudo relevance feedback
Zheng Ye, Jimmy Xiangji Huang
Pages: 323-332
doi>10.1145/2600428.2609636
Full text: PDFPDF

Pseudo Relevance Feedback is an effective technique to improve the performance of ad-hoc information retrieval. Traditionally, the expansion terms are extracted either according to the term distributions in the feedback documents; or according to both ...
expand
SESSION: Plenary address
Shlomo Geva
Seeking simplicity in search user interfaces
Marti A. Hearst
Pages: 333-334
doi>10.1145/2600428.2617558
Full text: PDFPDF

It is rare for a new user interface to break through and become successful, especially in information-intensive tasks like search, coming to consensus or building up knowledge. Most complex interfaces end up going unused. Often the successful solution ...
expand
SESSION: Session 4a: think globally, act locally
Matt Lease
Who is the barbecue king of texas?: a geo-spatial approach to finding local experts on twitter
Zhiyuan Cheng, James Caverlee, Himanshu Barthwal, Vandana Bachani
Pages: 335-344
doi>10.1145/2600428.2609580
Full text: PDFPDF

This paper addresses the problem of identifying local experts in social media systems like Twitter. Local experts -- in contrast to general topic experts -- have specialized knowledge focused around a particular location, and are important for many applications ...
expand
Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction
Longke Hu, Aixin Sun, Yong Liu
Pages: 345-354
doi>10.1145/2600428.2609593
Full text: PDFPDF

Rating prediction is to predict the preference rating of a user to an item that she has not rated before. Using the business review data from Yelp, in this paper, we study business rating prediction. A business here can be a restaurant, a shopping mall ...
expand
Processing spatial keyword query as a top-k aggregation query
Dongxiang Zhang, Chee-Yong Chan, Kian-Lee Tan
Pages: 355-364
doi>10.1145/2600428.2609562
Full text: PDFPDF

We examine the spatial keyword search problem to retrieve objects of interest that are ranked based on both their spatial proximity to the query location as well as the textual relevance of the object's keywords. Existing solutions for the problem are ...
expand
SESSION: Session 4b: scientia potentia est
Isabelle Moulinier
Entity query feature expansion using knowledge base links
Jeffrey Dalton, Laura Dietz, James Allan
Pages: 365-374
doi>10.1145/2600428.2609628
Full text: PDFPDF

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and ...
expand
QUADS: question answering for decision support
Zi Yang, Ying Li, James Cai, Eric Nyberg
Pages: 375-384
doi>10.1145/2600428.2609606
Full text: PDFPDF

As the scale of available on-line data grows ever larger, individuals and businesses must cope with increasing complexity in decision-making processes which utilize large volumes of unstructured, semi-structured and/or structured data to satisfy multiple, ...
expand
Topic labeled text classification: a weakly supervised approach
Swapnil Hingmire, Sutanu Chakraborti
Pages: 385-394
doi>10.1145/2600428.2609565
Full text: PDFPDF

Supervised text classifiers require extensive human expertise and labeling efforts. In this paper, we propose a weakly supervised text classification algorithm based on the labeling of Latent Dirichlet Allocation (LDA) topics. Our algorithm is based ...
expand
SESSION: Session 4c: more hashing
Mark Sanderson
Discriminative coupled dictionary hashing for fast cross-media retrieval
Zhou Yu, Fei Wu, Yi Yang, Qi Tian, Jiebo Luo, Yueting Zhuang
Pages: 395-404
doi>10.1145/2600428.2609563
Full text: PDFPDF

Cross-media hashing, which conducts cross-media retrieval by embedding data from different modalities into a common low-dimensional Hamming space, has attracted intensive attention in recent years. The existing cross-media hashing approaches only aim ...
expand
Active hashing with joint data example and tag selection
Qifan Wang, Luo Si, Zhiwei Zhang, Ning Zhang
Pages: 405-414
doi>10.1145/2600428.2609590
Full text: PDFPDF

Similarity search is an important problem in many large scale applications such as image and text retrieval. Hashing method has become popular for similarity search due to its fast search speed and low storage cost. Recent research has shown that hashing ...
expand
Latent semantic sparse hashing for cross-modal similarity search
Jile Zhou, Guiguang Ding, Yuchen Guo
Pages: 415-424
doi>10.1145/2600428.2609610
Full text: PDFPDF

Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively ...
expand
SESSION: Session 5a: brains!!!
Mark Smucker
Predicting term-relevance from brain signals
Manuel J.A. Eugster, Tuukka Ruotsalo, Michiel M. Spapé, Ilkka Kosunen, Oswald Barral, Niklas Ravaja, Giulio Jacucci, Samuel Kaski
Pages: 425-434
doi>10.1145/2600428.2609594
Full text: PDFPDF

Term-Relevance Prediction from Brain Signals (TRPB) is proposed to automatically detect relevance of text information directly from brain signals. An experiment with forty participants was conducted to record neural activity of participants while providing ...
expand
Multidimensional relevance modeling via psychometrics and crowdsourcing
Yinglong Zhang, Jin Zhang, Matthew Lease, Jacek Gwizdka
Pages: 435-444
doi>10.1145/2600428.2609577
Full text: PDFPDF

While many multidimensional models of relevance have been posited, prior studies have been largely exploratory rather than confirmatory. Lacking a methodological framework to quantify the relationships among factors or measure model fit to observed data, ...
expand
SESSION: Session 5b0: auto-completio
Jimmy Lin
Learning user reformulation behavior for query auto-completion
Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, Pu-Jen Cheng
Pages: 445-454
doi>10.1145/2600428.2609614
Full text: PDFPDF

It is crucial for query auto-completion to accurately predict what a user is typing. Given a query prefix and its context (e.g., previous queries), conventional context-aware approaches often produce relevant queries to the context. The purpose of this ...
expand
A two-dimensional click model for query auto-completion
Yanen Li, Anlei Dong, Hongning Wang, Hongbo Deng, Yi Chang, ChengXiang Zhai
Pages: 455-464
doi>10.1145/2600428.2609571
Full text: PDFPDF

Query auto-completion (QAC) facilitates faster user query input by predicting users' intended queries. Most QAC algorithms take a learning-based approach to incorporate various signals for query relevance prediction. However, such models are trained ...
expand
SESSION: Session 5b1: how to win friends and influence people
Jimmy Lin
On measuring social friend interest similarities in recommender systems
Hao Ma
Pages: 465-474
doi>10.1145/2600428.2609635
Full text: PDFPDF

Social recommender system has become an emerging research topic due to the prevalence of online social networking services during the past few years. In this paper, aiming at providing fundamental support to the research of social recommendation problem, ...
expand
IMRank: influence maximization via finding self-consistent ranking
Suqi Cheng, Huawei Shen, Junming Huang, Wei Chen, Xueqi Cheng
Pages: 475-484
doi>10.1145/2600428.2609592
Full text: PDFPDF

Influence maximization, fundamental for word-of-mouth marketing and viral marketing, aims to find a set of seed nodes maximizing influence spread on social network. Early methods mainly fall into two paradigms with certain benefits and drawbacks: (1) ...
expand
SESSION: Session 5c: collaborative complex personalization
Jimmy Huang
User-driven system-mediated collaborative information retrieval
Laure Soulier, Chirag Shah, Lynda Tamine
Pages: 485-494
doi>10.1145/2600428.2609598
Full text: PDFPDF

Most of the previous approaches surrounding collaborative information retrieval (CIR) provide either a user-based mediation, in which the system only supports users' collaborative activities, or a system-based mediation, in which the system plays an ...
expand
SearchPanel: framing complex search needs
Pernilla Qvarfordt, Simon Tretter, Gene Golovchinsky, Tony Dunnigan
Pages: 495-504
doi>10.1145/2600428.2609620
Full text: PDFPDF

People often use more than one query when searching for information. They revisit search results to re-find information and build an understanding of their search need through iterative explorations of query formulation. These tasks are not well-supported ...
expand
Cohort modeling for enhanced personalized search
Jinyun Yan, Wei Chu, Ryen W. White
Pages: 505-514
doi>10.1145/2600428.2609617
Full text: PDFPDF

Web search engines utilize behavioral signals to develop search experiences tailored to individual users. To be effective, such personalization relies on access to sufficient information about each user's interests and intentions. For new users or new ...
expand
Characterizing multi-click search behavior and the risks and opportunities of changing results during use
Chia-Jung Lee, Jaime Teevan, Sebastian de la Chica
Pages: 515-524
doi>10.1145/2600428.2609588
Full text: PDFPDF

Although searchers often click on more than one result following a query, little is known about how they interact with search results after their first click. Using large scale query log analysis, we characterize what people do when they return to a ...
expand
SESSION: Plenary address
Andrew Trotman
The data revolution: how companies are transforming with big data
Hugh E. Williams
Pages: 525-526
doi>10.1145/2600428.2617559
Full text: PDFPDF

Spelling correction in the 1990s was all about algorithms and small dictionaries. This century, it is about mining vast data sets of past user behaviors, simple algorithms, and using those to correct mistakes. The large Internet giants are data-driven ...
expand
SESSION: Session 6a: #moremicroblog #sigir2014
ChengXiang Zhai
Learning similarity functions for topic detection in online reputation monitoring
Damiano Spina, Julio Gonzalo, Enrique Amigó
Pages: 527-536
doi>10.1145/2600428.2609621
Full text: PDFPDF

Reputation management experts have to monitor--among others--Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality...). Solving this reputation monitoring problem automatically ...
expand
Predicting trending messages and diffusion participants in microblogging network
Jingwen Bian, Yang Yang, Tat-Seng Chua
Pages: 537-546
doi>10.1145/2600428.2609616
Full text: PDFPDF

Microblogging services have emerged as an essential way to strengthen the communications among individuals. One of the most important features of microblog over traditional social networks is the extensive proliferation in information diffusion. As the ...
expand
Leveraging knowledge across media for spammer detection in microblogging
Xia Hu, Jiliang Tang, Huan Liu
Pages: 547-556
doi>10.1145/2600428.2609632
Full text: PDFPDF

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on ...
expand
SESSION: Session 6b: scents and sensibility
Doug Oard
Using information scent and need for cognition to understand online search behavior
Wan-Ching Wu, Diane Kelly, Avneesh Sud
Pages: 557-566
doi>10.1145/2600428.2609626
Full text: PDFPDF

The purpose of this study is to investigate the extent to which two theories, Information Scent and Need for Cognition, explain people's search behaviors when interacting with search engine results pages (SERPs). Information Scent, the perception of ...
expand
Discrimination between tasks with user activity patterns during information search
Michael J. Cole, Chathra Hendahewa, Nicholas J. Belkin, Chirag Shah
Pages: 567-576
doi>10.1145/2600428.2609591
Full text: PDFPDF

Can the activity patterns of page use during information search sessions discriminate between different types of information seeking tasks? We model sequences of interactions with search result and content pages during information search sessions. Two ...
expand
Investigating users' query formulations for cognitive search intents
Makoto P. Kato, Takehiro Yamamoto, Hiroaki Ohshima, Katsumi Tanaka
Pages: 577-586
doi>10.1145/2600428.2609566
Full text: PDFPDF

This study investigated query formulations by users with {\it Cognitive Search Intents} (CSIs), which are users' needs for the cognitive characteristics of documents to be retrieved, {\em e.g. comprehensibility, subjectivity, and concreteness. Our four ...
expand
SESSION: Session 6c: users vs. models
Ricardo Baeza-Yates
Win-win search: dual-agent stochastic game in session search
Jiyun Luo, Sicong Zhang, Hui Yang
Pages: 587-596
doi>10.1145/2600428.2609629
Full text: PDFPDF

Session search is a complex search task that involves multiple search iterations triggered by query reformulations. We observe a Markov chain in session search: user's judgment of retrieved documents in the previous search iteration affects user's actions ...
expand
Injecting user models and time into precision via Markov chains
Marco Ferrante, Nicola Ferro, Maria Maistro
Pages: 597-606
doi>10.1145/2600428.2609637
Full text: PDFPDF

We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user models into precision. Continuous-time MP behaves like time-calibrated measures, bringing ...
expand
Searching, browsing, and clicking in a search session: changes in user behavior by task and over time
Jiepu Jiang, Daqing He, James Allan
Pages: 607-616
doi>10.1145/2600428.2609633
Full text: PDFPDF

There are many existing studies of user behavior in simple tasks (e.g., navigational and informational search) within a short duration of 1--2 queries. However, we know relatively little about user behavior, especially browsing and clicking behavior, ...
expand
SESSION: Session 7a: sentiments
Kevyn Collins-Thompson
Coarse-to-fine review selection via supervised joint aspect and sentiment model
Zhen Hai, Gao Cong, Kuiyu Chang, Wenting Liu, Peng Cheng
Pages: 617-626
doi>10.1145/2600428.2609570
Full text: PDFPDF

Online reviews are immensely valuable for customers to make informed purchase decisions and for businesses to improve the quality of their products and services. However, customer reviews grow exponentially while varying greatly in quality. It is generally ...
expand
Cross-domain and cross-category emotion tagging for comments of online news
Ying Zhang, Ning Zhang, Luo Si, Yanshan Lu, Qifan Wang, Xiaojie Yuan
Pages: 627-636
doi>10.1145/2600428.2609587
Full text: PDFPDF

In many online news services, users often write comments towards news in subjective emotions such as sadness, happiness or anger. Knowing such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate ...
expand
Economically-efficient sentiment stream analysis
Roberto Lourenco Jr., Adriano Veloso, Adriano Pereira, Wagner Meira Jr., Renato Ferreira, Srinivasan Parthasarathy
Pages: 637-646
doi>10.1145/2600428.2609612
Full text: PDFPDF

Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search ...
expand
SESSION: Session 7b: more like those
Yi Zhang
New and improved: modeling versions to improve app recommendation
Jovian Lin, Kazunari Sugiyama, Min-Yen Kan, Tat-Seng Chua
Pages: 647-656
doi>10.1145/2600428.2609560
Full text: PDFPDF

Existing recommender systems usually model items as static -- unchanging in attributes, description, and features. However, in domains such as mobile apps, a version update may provide substantial changes to an app as updates, reflected by an increment ...
expand
Bundle recommendation in ecommerce
Tao Zhu, Patrick Harrington, Junjun Li, Lei Tang
Pages: 657-666
doi>10.1145/2600428.2609603
Full text: PDFPDF

Recommender system has become an important component in modern eCommerce. Recent research on recommender systems has been mainly concentrating on improving the relevance or profitability of individual recommended items. But in reality, users are usually ...
expand
Does product recommendation meet its waterloo in unexplored categories?: no, price comes to help
Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
Pages: 667-676
doi>10.1145/2600428.2609608
Full text: PDFPDF

State-of-the-art methods for product recommendation encounter significant performance drop in categories where a user has no purchase history. This problem needs to be addressed since current online retailers are moving beyond single category and attempting ...
expand
SESSION: Session 7c: signs and symbols
Jaap Kamps
Query expansion for mixed-script information retrieval
Parth Gupta, Kalika Bali, Rafael E. Banchs, Monojit Choudhury, Paolo Rosso
Pages: 677-686
doi>10.1145/2600428.2609622
Full text: PDFPDF

For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or ...
expand
Retrieval of similar chess positions
Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones
Pages: 687-696
doi>10.1145/2600428.2609605
Full text: PDFPDF

We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based ...
expand
A mathematics retrieval system for formulae in layout presentations
Xiaoyan Lin, Liangcai Gao, Xuan Hu, Zhi Tang, Yingnan Xiao, Xiaozhong Liu
Pages: 697-706
doi>10.1145/2600428.2609611
Full text: PDFPDF

The semantics of mathematical formulae depend on their spatial structure, and they usually exist in layout presentations such as PDF, LaTeX, and Presentation MathML, which challenges previous text index and retrieval methods. This paper proposes an innovative ...
expand
SESSION: Session 8a: picture this
Grace Hui Yang
The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos
Pai Peng, Lidan Shou, Ke Chen, Gang Chen, Sai Wu
Pages: 707-716
doi>10.1145/2600428.2609557
Full text: PDFPDF

This paper presents a project called Knowing Camera for real-time recognizing and annotating places-of-interest(POI) in smartphone photos, with the availability of online geotagged images of such places. We propose a`"Spatial+Visual" (S+V) framework ...
expand
Click-through-based cross-view learning for image search
Yingwei Pan, Ting Yao, Tao Mei, Houqiang Li, Chong-Wah Ngo, Yong Rui
Pages: 717-726
doi>10.1145/2600428.2609568
Full text: PDFPDF

One of the fundamental problems in image search is to rank image documents according to a given textual query. Existing search engines highly depend on surrounding texts for ranking images, or leverage the query-image pairs annotated by human labelers ...
expand
Learning to personalize trending image search suggestion
Chun-Che Wu, Tao Mei, Winston H. Hsu, Yong Rui
Pages: 727-736
doi>10.1145/2600428.2609569
Full text: PDFPDF

Trending search suggestion is leading a new paradigm of image search, where user's exploratory search experience is facilitated with the automatic suggestion of trending queries. Existing image search engines, however, only provide general suggestions ...
expand
PRISM: concept-preserving social image search results summarization
Boon-Siew Seah, Sourav S. Bhowmick, Aixin Sun
Pages: 737-746
doi>10.1145/2600428.2609586
Full text: PDFPDF

Most existing tag-based social image search engines present search results as a ranked list of images, which cannot be consumed by users in a natural and intuitive manner. In this paper, we present a novel concept-preserving image search results summarization ...
expand
SESSION: Session 8b: time and tide
Oren Kurland
Time-critical search
Nina Mishra, Ryen W. White, Samuel Ieong, Eric Horvitz
Pages: 747-756
doi>10.1145/2600428.2609613
Full text: PDFPDF

We study time-critical search, where users have urgent information needs in the context of an acute problem. As examples, users may need to know how to stem a severe bleed, help a baby who is choking on a foreign object, or respond to an epileptic seizure. ...
expand
Learning temporal-dependent ranking models
Miguel Costa, Francisco Couto, Mário Silva
Pages: 757-766
doi>10.1145/2600428.2609619
Full text: PDFPDF

Web archives already hold together more than 534 billion files and this number continues to grow as new initiatives arise. Searching on all versions of these files acquired throughout time is challenging, since users expect as fast and precise answers ...
expand
Web page segmentation with structured prediction and its application in web page classification
Lidong Bing, Rui Guo, Wai Lam, Zheng-Yu Niu, Haifeng Wang
Pages: 767-776
doi>10.1145/2600428.2609630
Full text: PDFPDF

We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the ...
expand
Query log driven web search results clustering
Jose G. Moreno, Gaël Dias, Guillaume Cleuziou
Pages: 777-786
doi>10.1145/2600428.2609583
Full text: PDFPDF

Different important studies in Web search results clustering have recently shown increasing performances motivated by the use of external resources. Following this trend, we present a new algorithm called Dual C-Means, which provides a theoretical background ...
expand
SESSION: Session 8c0: summaries and semantics
Paul Bennett
CTSUM: extracting more certain summaries for news articles
Xiaojun Wan, Jianmin Zhang
Pages: 787-796
doi>10.1145/2600428.2609559
Full text: PDFPDF

People often read summaries of news articles in order to get reliable information about an event or a topic. However, the information expressed in news articles is not always certain, and some sentences contain uncertain information about the event. ...
expand
Continuous word embeddings for detecting local text reuses at the semantic level
Qi Zhang, Jihua Kang, Jin Qian, Xuanjing Huang
Pages: 797-806
doi>10.1145/2600428.2609597
Full text: PDFPDF

Text reuse is a common phenomenon in a variety of user-generated content. Along with the quick expansion of social media, reuses of local text are occurring much more frequently than ever before. The task of detecting these local reuses serves as an ...
expand
SESSION: Session 8C1: [citation] recommendation
Paul Bennett
CiteSight: supporting contextual citation recommendation using differential search
Avishay Livne, Vivek Gokuladas, Jaime Teevan, Susan T. Dumais, Eytan Adar
Pages: 807-816
doi>10.1145/2600428.2609585
Full text: PDFPDF

A person often uses a single search engine for very different tasks. For example, an author editing a manuscript may use the same academic search engine to find the latest work on a particular topic or to find the correct citation for a familiar article. ...
expand
Cross-language context-aware citation recommendation in scientific articles
Xuewei Tang, Xiaojun Wan, Xun Zhang
Pages: 817-826
doi>10.1145/2600428.2609564
Full text: PDFPDF

Adequacy of citations is very important for a scientific paper. However, it is not an easy job to find appropriate citations for a given context, especially for citations in different languages. In this paper, we define a novel task of cross-language ...
expand
POSTER SESSION: Poster session (short papers)
Search result diversification via data fusion
Shengli Wu, Chunlan Huang
Pages: 827-830
doi>10.1145/2600428.2609451
Full text: PDFPDF

In recent years, researchers have investigated search result diversification through a variety of approaches. In such situations, information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. On ...
expand
Hashtag recommendation for hyperlinked tweets
Surendra Sedhai, Aixin Sun
Pages: 831-834
doi>10.1145/2600428.2609452
Full text: PDFPDF

Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending hashtags to hyperlinked ...
expand
Personalized document re-ranking based on Bayesian probabilistic matrix factorization
Fei Cai, Shangsong Liang, Maarten de Rijke
Pages: 835-838
doi>10.1145/2600428.2609453
Full text: PDFPDF

A query considered in isolation provides limited information about the searcher's interest. Previous work has considered various types of user behavior, e.g., clicks and dwell time, to obtain a better understanding of the user's intent. We consider the ...
expand
Using the cross-entropy method to re-rank search results
Haggai Roitman, Shay Hummel, Oren Kurland
Pages: 839-842
doi>10.1145/2600428.2609454
Full text: PDFPDF

We present a novel unsupervised approach to re-ranking an initially retrieved list. The approach is based on the Cross Entropy method applied to permutations of the list, and relies on performance prediction. Using pseudo predictors we establish a lower ...
expand
Computing and applying topic-level user interactions in microblog recommendation
Xiao Lu, Peng Li, Hongyuan Ma, Shuxin Wang, Anying Xu, Bin Wang
Pages: 843-846
doi>10.1145/2600428.2609455
Full text: PDFPDF

With the development of microblog services, tens of thousands of messages are produced every day and recommending useful messages according to users' interest is recognized as an effective way to overcome the information overload problem. Collaborative ...
expand
Towards context-aware search with right click
Aixin Sun, Chii-Hian Lou
Pages: 847-850
doi>10.1145/2600428.2609456
Full text: PDFPDF

Many queries are submitted to search engines by right-clicking the marked text (i.e., the query) in Web browsers. Because the document being read by the searcher often provides sufficient contextual information for the query, search engine could provide ...
expand
Rendering expressions to improve accuracy of relevance assessment for math search
Matthias S. Reichenbach, Anurag Agarwal, Richard Zanibbi
Pages: 851-854
doi>10.1145/2600428.2609457
Full text: PDFPDF

Finding ways to help users assess relevance when they search using math expressions is critical for making Mathematical Information Retrieval (MIR) systems easier to use. We designed a study where participants completed search tasks involving mathematical ...
expand
Exploring recommendations in internet of things
Lina Yao, Quan Z. Sheng, Anne H.H. Ngu, Helen Ashman, Xue Li
Pages: 855-858
doi>10.1145/2600428.2609458
Full text: PDFPDF

With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web-based services, physical things are becoming an integral part of the emerging ubiquitous Web. In this paper, we focus on the things recommendation problem ...
expand
Sig-SR: SimRank search over singular graphs
Weiren Yu, Julie A. McCann
Pages: 859-862
doi>10.1145/2600428.2609459
Full text: PDFPDF

SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for ...
expand
Old dogs are great at new tricks: column stores for ir prototyping
Hannes Mühleisen, Thaer Samar, Jimmy Lin, Arjen de Vries
Pages: 863-866
doi>10.1145/2600428.2609460
Full text: PDFPDF

We make the suggestion that instead of implementing custom index structures and query evaluation algorithms, IR researchers should simply store document representations in a column-oriented relational database and implement ranking models using SQL. ...
expand
The role of network distance in linkedin people search
Shih-Wen Huang, Daniel Tunkelang, Karrie Karahalios
Pages: 867-870
doi>10.1145/2600428.2609461
Full text: PDFPDF

LinkedIn is the world's largest professional network, with over 300 million members. One of the primary activities on the site is people search, for which LinkedIn members are both the users and the corpus. This paper presents insights about people search ...
expand
Latent community discovery through enterprise user search query modeling
Kevin M. Carter, Rajmonda S. Caceres, Ben Priest
Pages: 871-874
doi>10.1145/2600428.2609462
Full text: PDFPDF

Enterprise computer networks are filled with users performing a variety of tasks, ranging from business-critical tasks to personal interest browsing. Due to this multi-modal distribution of behaviors, it is non-trivial to automatically discern which ...
expand
Examining collaborative query reformulation: a case of travel information searching
Abu Shamim Mohammad Arif, Jia Tina Du, Ivan Lee
Pages: 875-878
doi>10.1145/2600428.2609463
Full text: PDFPDF

Users often reformulate or modify their queries when they engage in searching information particularly when the search task is complex and exploratory. This paper investigates query reformulation behavior in collaborative tourism information searching ...
expand
Influential nodes selection: a data reconstruction perspective
Zhefeng Wang, Hao Wang, Qi Liu, Enhong Chen
Pages: 879-882
doi>10.1145/2600428.2609464
Full text: PDFPDF

Influence maximization is the problem of finding a set of seed nodes in social network for maximizing the spread of influence. Traditionally, researchers view influence propagation as a stochastic process and formulate the influence maximization problem ...
expand
A fusion approach to cluster labeling
Haggai Roitman, Shay Hummel, Michal Shmueli-Scheuer
Pages: 883-886
doi>10.1145/2600428.2609465
Full text: PDFPDF

We present a novel approach to the cluster labeling task using fusion methods. The core idea of our approach is to weigh labels, suggested by any labeler, according to the estimated labeler's decisiveness with respect to each of its suggested labels. ...
expand
Evaluating the effort involved in relevance assessments for images
Martin Halvey, Robert Villa
Pages: 887-890
doi>10.1145/2600428.2609466
Full text: PDFPDF

How assessors and end users judge the relevance of images has been studied in information science and information retrieval for a considerable time. The criteria by which assessors' judge relevance has been intensively studied, and there has been a large ...
expand
Diversifying query suggestions based on query documents
Youngho Kim, W. Bruce Croft
Pages: 891-894
doi>10.1145/2600428.2609467
Full text: PDFPDF

Many domain-specific search tasks are initiated by document-length queries, e.g., patent invalidity search aims to find prior art related to a new (query) patent. We call this type of search Query Document Search. In this type of search, the initial ...
expand
Comparing client and server dwell time estimates for click-level satisfaction prediction
Youngho Kim, Ahmed Hassan, Ryen W. White, Imed Zitouni
Pages: 895-898
doi>10.1145/2600428.2609468
Full text: PDFPDF

Click dwell time is the amount of time that a user spends on a clicked search result. Many previous studies have shown that click dwell time is strongly correlated with result-level satisfaction and document relevance. Accurate estimates of dwell time ...
expand
Score-safe term-dependency processing with hybrid indexes
Matthias Petri, Alistair Moffat, J. Shane Culpepper
Pages: 899-902
doi>10.1145/2600428.2609469
Full text: PDFPDF

Score-safe index processing has received a great deal of attention over the last two decades. By pre-calculating maximum term impacts during indexing, the number of scoring operations can be minimized, and the top-k documents for a query can be located ...
expand
Co-training on authorship attribution with very fewlabeled examples: methods vs. views
Tieyun Qian, Bing Liu, Ming Zhong, Guoliang He
Pages: 903-906
doi>10.1145/2600428.2609470
Full text: PDFPDF

Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is hard or expensive to ...
expand
Probabilistic text modeling with orthogonalized topics
Enpeng Yao, Guoqing Zheng, Ou Jin, Shenghua Bao, Kailong Chen, Zhong Su, Yong Yu
Pages: 907-910
doi>10.1145/2600428.2609471
Full text: PDFPDF

Topic models have been widely used for text analysis. Previous topic models have enjoyed great success in mining the latent topic structure of text documents. With many efforts made on endowing the resulting document-topic distributions with different ...
expand
Evaluating non-deterministic retrieval systems
Gaya K. Jayasinghe, William Webber, Mark Sanderson, Lasitha S. Dharmasena, J. Shane Culpepper
Pages: 911-914
doi>10.1145/2600428.2609472
Full text: PDFPDF

The use of sampling, randomized algorithms, or training based on the unpredictable inputs of users in Information Retrieval often leads to non-deterministic outputs. Evaluating the effectiveness of systems incorporating these methods can be challenging ...
expand
Extending test collection pools without manual runs
Gaya K. Jayasinghe, William Webber, Mark Sanderson, J. Shane Culpepper
Pages: 915-918
doi>10.1145/2600428.2609473
Full text: PDFPDF

Information retrieval test collections traditionally use a combination of automatic and manual runs to create a pool of documents to be judged. The quality of the final judgments produced for a collection is a product of the variety across each of the ...
expand
The search duel: a response to a strong ranker
Peter Izsak, Fiana Raiber, Oren Kurland, Moshe Tennenholtz
Pages: 919-922
doi>10.1145/2600428.2609474
Full text: PDFPDF

How can a search engine with a relatively weak relevance ranking function compete with a search engine that has a much stronger ranking function? This dual challenge, which to the best of our knowledge has not been addressed in previous work, entails ...
expand
Modeling the evolution of product entities
Priya Radhakrishnan, Manish Gupta, Vasudeva Varma
Pages: 923-926
doi>10.1145/2600428.2609475
Full text: PDFPDF

A large number of web queries are related to product entities. Studying evolution of product entities can help analysts understand the change in particular attribute values for these products. However, studying the evolution of a product requires us ...
expand
Predicting bursts and popularity of hashtags in real-time
Shoubin Kong, Qiaozhu Mei, Ling Feng, Fei Ye, Zhe Zhao
Pages: 927-930
doi>10.1145/2600428.2609476
Full text: PDFPDF

Hashtags have been widely used to annotate topics in tweets (short posts on Twitter.com). In this paper, we study the problems of real-time prediction of bursting hashtags. Will a hashtag burst in the near future? If it will, how early can we predict ...
expand
Probabilistic ensemble learning for vietnamese word segmentation
Wuying Liu, Li Lin
Pages: 931-934
doi>10.1145/2600428.2609477
Full text: PDFPDF

Word segmentation is a challenging issue, and the corresponding algorithms can be used in many applications of natural language processing. This paper addresses the problem of Vietnamese word segmentation, proposes a probabilistic ensemble learning (PEL) ...
expand
Improving unsupervised query segmentation using parts-of-speech sequence information
Rishiraj Saha Roy, Yogarshi Vyas, Niloy Ganguly, Monojit Choudhury
Pages: 935-938
doi>10.1145/2600428.2609478
Full text: PDFPDF

We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different ...
expand
Building a query log via crowdsourcing
Omar Alonso, Maria Stone
Pages: 939-942
doi>10.1145/2600428.2609479
Full text: PDFPDF

A query log is a key asset in a commercial search engine. Everyday millions of users rely on search engines to find information on the Web by entering a few keywords on a simple search interface. Those queries represent a subset of user behavioral data ...
expand
Web search without 'stupid' results
Aleksandra Lomakina, Nikita Povarov, Pavel Serdyukov
Pages: 943-946
doi>10.1145/2600428.2609480
Full text: PDFPDF

One of the main targets of any search engine is to make every user fully satisfied with her search results. For this reason, lots of efforts are being paid to improving ranking models in order to show the best results to users. However, there is a class ...
expand
Discovering real-world use cases for a multimodal math search interface
Keita Del Valle Wangari, Richard Zanibbi, Anurag Agarwal
Pages: 947-950
doi>10.1145/2600428.2609481
Full text: PDFPDF

To use math expressions in search, current search engines require knowing expression names or using a structure editor or string encoding (e.g., LaTeX). For mathematical non-experts, this can lead to an "intention gap" between the query they wish to ...
expand
Improving search personalisation with dynamic group formation
Thanh Tien Vu, Dawei Song, Alistair Willis, Son Ngoc Tran, Jingfei Li
Pages: 951-954
doi>10.1145/2600428.2609482
Full text: PDFPDF

Recent research has shown that the performance of search engines can be improved by enriching a user's personal profile with information about other users with shared interests. In the existing approaches, groups of similar users are often statically ...
expand
Detection of abnormal profiles on group attacks in recommender systems
Wei Zhou, Yun Sing Koh, Junhao Wen, Shafiq Alam, Gillian Dobbie
Pages: 955-958
doi>10.1145/2600428.2609483
Full text: PDFPDF

Recommender systems using Collaborative Filtering techniques are capable of make personalized predictions. However, these systems are highly vulnerable to profile injection attacks. Group attacks are attacks that target a group of items instead of one, ...
expand
On run diversity in Evaluation as a Service
Ellen M. Voorhees, Jimmy Lin, Miles Efron
Pages: 959-962
doi>10.1145/2600428.2609484
Full text: PDFPDF

"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through ...
expand
Evaluating answer passages using summarization measures
Mostafa Keikha, Jae Hyun Park, W. Bruce Croft
Pages: 963-966
doi>10.1145/2600428.2609485
Full text: PDFPDF

Passage-based retrieval models have been studied for some time and have been shown to have some benefits for document ranking. Finding passages that are not only topically relevant, but are also answers to the users' questions would have a significant ...
expand
Analyzing bias in CQA-based expert finding test sets
Reyyan Yeniterzi, Jamie Callan
Pages: 967-970
doi>10.1145/2600428.2609486
Full text: PDFPDF

Data retrieved from community question answering (CQA) sites, such as content and users' assessments of content, is commonly used for expertise estimation related tasks. One such task, in which the received votes are directly used as graded relevance ...
expand
Understanding negation and family history to improve clinical information retrieval
Bevan Koopman, Guido Zuccon
Pages: 971-974
doi>10.1145/2600428.2609487
Full text: PDFPDF

We present a study to understand the effect that negated terms (e.g., "no fever") and family history (e.g., "family his- tory of diabetes") have on searching clinical records. Our analysis is aimed at devising the most effective means of handling negation ...
expand
Modeling dual role preferences for trust-aware recommendation
Weilong Yao, Jing He, Guangyan Huang, Yanchun Zhang
Pages: 975-978
doi>10.1145/2600428.2609488
Full text: PDFPDF

Unlike in general recommendation scenarios where a user has only a single role, users in trust rating network, e.g. Epinions, are associated with two different roles simultaneously: as a truster and as a trustee. With different roles, users can show ...
expand
Mouse movement during relevance judging: implications for determining user attention
Mark D. Smucker, Xiaoyu Sunny Guo, Andrew Toulis
Pages: 979-982
doi>10.1145/2600428.2609489
Full text: PDFPDF

Several researchers have found that a user's mouse position gives an indication of the user's gaze during web search and other tasks. As part of a user study that involved relevance judging of document summaries and full documents, we recorded users' ...
expand
PAAP: prefetch-aware admission policies for query results cache in web search engines
Hongyuan Ma, Wei Liu, Bingjie Wei, Liang Shi, Xiuguo Bao, Lihong Wang, Bin Wang
Pages: 983-986
doi>10.1145/2600428.2609490
Full text: PDFPDF

Caching query results is an efficient technique for Web search engines. Admission policy can prevent infrequent queries from taking space of more frequent queries in the cache. In this paper we present two novel admission policies tailored for query ...
expand
User geospatial context for music recommendation in microblogs
Markus Schedl, Andreu Vall, Katayoun Farrahi
Pages: 987-990
doi>10.1145/2600428.2609491
Full text: PDFPDF

Music information retrieval and music recommendation are seeing a paradigm shift towards methods that incorporate user context aspects. However, structured experiments on a standardized music dataset to investigate the effects of doing so are scarce. ...
expand
Compositional data analysis (CoDA) approaches to distance in information retrieval
Paul Thomas, David Lovell
Pages: 991-994
doi>10.1145/2600428.2609492
Full text: PDFPDF

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole---term frequencies are a familiar example. Proportions carry only relative information and are not free to vary ...
expand
Group latent factor model for recommendation with multiple user behaviors
Jian Cheng, Ting Yuan, Jinqiao Wang, Hanqing Lu
Pages: 995-998
doi>10.1145/2600428.2609493
Full text: PDFPDF

Recently, some recommendation methods try to relieve the data sparsity problem of Collaborative Filtering by exploiting data from users' multiple types of behaviors. However, most of the exist methods mainly consider to model the correlation between ...
expand
Hashing with List-Wise learning to rank
Zhou Yu, Fei Wu, Yin Zhang, Siliang Tang, Jian Shao, Yueting Zhuang
Pages: 999-1002
doi>10.1145/2600428.2609494
Full text: PDFPDF

Hashing techniques have been extensively investigated to boost similarity search for large-scale high-dimensional data. Most of the existing approaches formulate the their objective as a pair-wise similarity-preserving problem. In this paper, we consider ...
expand
A burstiness-aware approach for document dating
Dimitrios Kotsakos, Theodoros Lappas, Dimitrios Kotzias, Dimitrios Gunopulos, Nattiya Kanhabua, Kjetil Nørvåg
Pages: 1003-1006
doi>10.1145/2600428.2609495
Full text: PDFPDF

A large number of mainstream applications, like temporal search, event detection, and trend identification, assume knowledge of the timestamp of every document in a given textual collection. In many cases, however, the required timestamps are either ...
expand
An analysis of query difficulty for information retrieval in the medical domain
Lorraine Goeuriot, Liadh Kelly, Johannes Leveling
Pages: 1007-1010
doi>10.1145/2600428.2609496
Full text: PDFPDF

We present a post-hoc analysis of a benchmarking activity for information retrieval (IR) in the medical domain to determine if performance for queries with different levels of complexity can be associated with different IR methods or techniques. Our ...
expand
Mobile query reformulations
Milad Shokouhi, Rosie Jones, Umut Ozertem, Karthik Raghunathan, Fernando Diaz
Pages: 1011-1014
doi>10.1145/2600428.2609497
Full text: PDFPDF

Users frequently interact with web search systems on their mobile devices via multiple modalities, including touch and speech. These interaction modes are substantially different from the user experience on desktop search. As a result, system designers ...
expand
On peculiarities of positional effects in sponsored search
Vyacheslav Alipov, Valery Topinsky, Ilya Trofimov
Pages: 1015-1018
doi>10.1145/2600428.2609498
Full text: PDFPDF

Click logs provide a unique and highly valuable source of human judgments on ads' relevance. However, clicks are heavily biased by lots of factors. Two main factors that are widely acknowledged to be the most influential ones are neighboring ads and ...
expand
A collective topic model for milestone paper discovery
Ziyu Lu, Nikos Mamoulis, David W. Cheung
Pages: 1019-1022
doi>10.1145/2600428.2609499
Full text: PDFPDF

Prior arts stay at the foundation for future work in academic research. However the increasingly large amount of publications makes it difficult for researchers to effectively discover the most important previous works to the topic of their research. ...
expand
Document summarization based on word associations
Oskar Gross, Antoine Doucet, Hannu Toivonen
Pages: 1023-1026
doi>10.1145/2600428.2609500
Full text: PDFPDF

In the age of big data, automatic methods for creating summaries of documents become increasingly important. In this paper we propose a novel, unsupervised method for (multi-)document summarization. In an unsupervised and language-independent fashion, ...
expand
Do users rate or review?: boost phrase-level sentiment labeling with review-level sentiment classification
Yongfeng Zhang, Haochen Zhang, Min Zhang, Yiqun Liu, Shaoping Ma
Pages: 1027-1030
doi>10.1145/2600428.2609501
Full text: PDFPDF

Current approaches for contextual sentiment lexicon construction in phrase-level sentiment analysis assume that the numerical star rating of a review represents the overall sentiment orientation of the review text. Although widely adopted, we find through ...
expand
Random subspace for binary codes learning in large scale image retrieval
Cong Leng, Jian Cheng, Hanqing Lu
Pages: 1031-1034
doi>10.1145/2600428.2609502
Full text: PDFPDF

Due to the fast query speed and low storage cost, hashing based approximate nearest neighbor search methods have attracted much attention recently. Many state of the art methods are based on eigenvalue decomposition. In these approaches, the information ...
expand
Incorporating query-specific feedback into learning-to-rank models
Ethem F. Can, W. Bruce Croft, R. Manmatha
Pages: 1035-1038
doi>10.1145/2600428.2609503
Full text: PDFPDF

Relevance feedback has been shown to improve retrieval for a broad range of retrieval models. It is the most common way of adapting a retrieval model for a specific query. In this work, we expand this common way by focusing on an approach that enables ...
expand
Large-scale author verification: temporal and topical influences
Michiel van Dam, Claudia Hauff
Pages: 1039-1042
doi>10.1145/2600428.2609504
Full text: PDFPDF

The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. ...
expand
Evaluating mobile web search performance by taking good abandonment into account
Olga Arkhipova, Lidia Grauer
Pages: 1043-1046
doi>10.1145/2600428.2609505
Full text: PDFPDF

Usage of mobile devices for Web search grows rapidly in recent years. The common tendency is that users want to receive information immediately results in incorporating rich snippets and vertical results into search engine result pages (SERPs) and in ...
expand
Assessing the reliability and reusability of an E-discovery privilege test collection
Jyothi K. Vinjumur, Douglas W. Oard, Jiaul H. Paik
Pages: 1047-1050
doi>10.1145/2600428.2609506
Full text: PDFPDF

In some jurisdictions, parties to a lawsuit can request documents from each other, but documents subject to a claim of privilege may be withheld. The TREC 2010 Legal Track developed what is presently the only public test collection for evaluating privilege ...
expand
Modeling evolution of a social network using temporalgraph kernels
Akash Anil, Niladri Sett, Sanasam Ranbir Singh
Pages: 1051-1054
doi>10.1145/2600428.2609507
Full text: PDFPDF

Majority of the studies on modeling the evolution of a social network using spectral graph kernels do not consider temporal effects while estimating the kernel parameters. As a result, such kernels fail to capture structural properties of the evolution ...
expand
On user interactions with query auto-completion
Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, Katja Hofmann
Pages: 1055-1058
doi>10.1145/2600428.2609508
Full text: PDFPDF

Query Auto-Completion (QAC) is a popular feature of web search engines that aims to assist users to formulate queries faster and avoid spelling mistakes by presenting them with possible completions as soon as they start typing. However, despite the wide ...
expand
Re-ranking approach to classification in large-scale power-law distributed category systems
Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-reza Amini
Pages: 1059-1062
doi>10.1145/2600428.2609509
Full text: PDFPDF

For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power-law distribution. ...
expand
Enhancing personalization via search activity attribution
Adish Singla, Ryen W. White, Ahmed Hassan, Eric Horvitz
Pages: 1063-1066
doi>10.1145/2600428.2609510
Full text: PDFPDF

Online services rely on machine identifiers to tailor services such as personalized search and advertising to individual users. The assumption made is that each identifier comprises the behavior of a single person. However, shared machine usage is common, ...
expand
A syntax-aware re-ranker for microblog retrieval
Aliaksei Severyn, Alessandro Moschitti, Manos Tsagkias, Richard Berendsen, Maarten de Rijke
Pages: 1067-1070
doi>10.1145/2600428.2609511
Full text: PDFPDF

We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly ...
expand
Weighted aspect-based collaborative filtering
YanPing Nie, Yang Liu, Xiaohui Yu
Pages: 1071-1074
doi>10.1145/2600428.2609512
Full text: PDFPDF

Existing work on collaborative filtering (CF) is often based on the overall ratings the items have received. However, in many cases, understanding how a user rates each aspect of an item may reveal more detailed information about her preferences and ...
expand
Evaluating intuitiveness of vertical-aware click models
Aleksandr Chuklin, Ke Zhou, Anne Schuth, Floor Sietsma, Maarten de Rijke
Pages: 1075-1078
doi>10.1145/2600428.2609513
Full text: PDFPDF

Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior ...
expand
Recipient recommendation in enterprises using communication graphs and email content
David Graus, David van Dijk, Manos Tsagkias, Wouter Weerkamp, Maarten de Rijke
Pages: 1079-1082
doi>10.1145/2600428.2609514
Full text: PDFPDF

We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the ...
expand
Analyzing the content emphasis of web search engines
Mohammed A. Alam, Doug Downey
Pages: 1083-1086
doi>10.1145/2600428.2609515
Full text: PDFPDF

Millions of people search the Web each day. As a consequence, the ranking algorithms employed by Web search engines have a profound influence on which pages users visit. Characterizing this influence, and informing users when different engines favor ...
expand
Effects of task and domain on searcher attention
Dmitry Lagun, Eugene Agichtein
Pages: 1087-1090
doi>10.1145/2600428.2609516
Full text: PDFPDF

Previous studies of online user attention during information seeking tasks have mainly focused on analyzing searcher behavior in the web search settings. While these studies enabled better understanding of search result examination, their findings might ...
expand
Learning sufficient queries for entity filtering
Miles Efron, Craig Willis, Garrick Sherman
Pages: 1091-1094
doi>10.1145/2600428.2609517
Full text: PDFPDF

Entity-centric document filtering is the task of analyzing a time-ordered stream of documents and emitting those that are relevant to a specified set of entities (e.g., people, places, organizations). This task is exemplified by the TREC Knowledge Base ...
expand
PatentLine: analyzing technology evolution on multi-view patent graphs
Longhui Zhang, Lei Li, Tao Li, Qi Zhang
Pages: 1095-1098
doi>10.1145/2600428.2609518
Full text: PDFPDF

The fast growth of technologies has driven the advancement of our society. It is often necessary to quickly grab the evolution of technologies in order to better understand the technology trend. The availability of huge volumes of granted patent documents ...
expand
Query performance prediction for entity retrieval
Hadas Raviv, Oren Kurland, David Carmel
Pages: 1099-1102
doi>10.1145/2600428.2609519
Full text: PDFPDF

We address the query-performance-prediction task for entity retrieval; that is, retrieval effectiveness is estimated with no relevance judgements. First we show how to adapt state-of-the-art query-performance predictors proposed for document retrieval ...
expand
Second order probabilistic models for within-document novelty detection in academic articles
Laurence A.F. Park, Simeon Simoff
Pages: 1103-1106
doi>10.1145/2600428.2609520
Full text: PDFPDF

It is becoming increasingly difficult to stay aware of the state-of-the-art in any research field due to the exponential increase in the number of academic publications. This problem effects authors and reviewers of submissions to academic journals and ...
expand
Modeling the dynamics of personal expertise
Yi Fang, Archana Godavarthy
Pages: 1107-1110
doi>10.1145/2600428.2609521
Full text: PDFPDF

Personal expertise or interests often evolve over time. Despite much work on expertise retrieval in the recent years, very little work has studied the dynamics of personal expertise. In this paper, we propose a probabilistic model to characterize how ...
expand
An annotation similarity model in passage ranking for historical fact validation
Jun Araki, Jamie Callan
Pages: 1111-1114
doi>10.1145/2600428.2609522
Full text: PDFPDF

State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity ...
expand
To hint or not: exploring the effectiveness of search hints for complex informational tasks
Denis Savenkov, Eugene Agichtein
Pages: 1115-1118
doi>10.1145/2600428.2609523
Full text: PDFPDF

Extensive previous research has shown that searchers often require assistance with query formulation and refinement. Yet, it is not clear what kind of assistance is most useful, and how effective it is both objectively (e.g., in terms of task success) ...
expand
The effect of sampling strategy on inferred measures
Ellen M. Voorhees
Pages: 1119-1122
doi>10.1145/2600428.2609524
Full text: PDFPDF

Using the inferred measures framework is a popular choice for constructing test collections when the target document set is too large for pooling to be a viable option. Within the framework, different amounts of assessing effort is placed on different ...
expand
Cache-conscious runtime optimization for ranking ensembles
Xun Tang, Xin Jin, Tao Yang
Pages: 1123-1126
doi>10.1145/2600428.2609525
Full text: PDFPDF

Multi-tree ensemble models have been proven to be effective for document ranking. Using a large number of trees can improve accuracy, but it takes time to calculate ranking scores of matched documents. This paper investigates data traversal methods for ...
expand
Bridging temporal context gaps using time-aware re-contextualization
Andrea Ceroni, Nam Khanh Tran, Nattiya Kanhabua, Claudia Niederée
Pages: 1127-1130
doi>10.1145/2600428.2609526
Full text: PDFPDF

Understanding a text, which was written some time ago, can be compared to translating a text from another language. Complete interpretation requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context ...
expand
An enhanced context-sensitive proximity model for probabilistic information retrieval
Jiashu Zhao, Jimmy Xiangji Huang
Pages: 1131-1134
doi>10.1145/2600428.2609527
Full text: PDFPDF

We propose to enhance proximity-based probabilistic retrieval models with more contextual information. A term pair with higher contextual relevance of term proximity is assigned a higher weight. Several measures are proposed to estimate the contextual ...
expand
On the information difference between standard retrieval models
Peter B. Golbus, Javed A. Aslam
Pages: 1135-1138
doi>10.1145/2600428.2609528
Full text: PDFPDF

Recent work introduced a probabilistic framework that measures search engine performance information-theoretically. This allows for novel meta-evaluation measures such as Information Difference, which measures the magnitude of the difference between ...
expand
A POMDP model for content-free document re-ranking
Sicong Zhang, Jiyun Luo, Hui Yang
Pages: 1139-1142
doi>10.1145/2600428.2609529
Full text: PDFPDF

Log-based document re-ranking is a special form of session search. The task re-ranks documents from Search Engine Results Page (SERP) according to the search logs, in which both the search activities from other users and personalized query log for a ...
expand
Using score differences for search result diversification
Sadegh Kharazmi, Mark Sanderson, Falk Scholer, David Vallet
Pages: 1143-1146
doi>10.1145/2600428.2609530
Full text: PDFPDF

We investigate the application of a light-weight approach to result list clustering for the purposes of diversifying search results. We introduce a novel post-retrieval approach, which is independent of external information or even the full-text content ...
expand
TREC: topic engineering exercise
J Shane Culpepper, Stefano Mizzaro, Mark Sanderson, Falk Scholer
Pages: 1147-1150
doi>10.1145/2600428.2609531
Full text: PDFPDF

In this work, we investigate approaches to engineer better topic sets in information retrieval test collections. By recasting the TREC evaluation exercise from one of building more effective systems to an exercise in building better topics, we present ...
expand
How k-12 students search for learning?: analysis of an educational search engine log
Arif Usta, Ismail Sengor Altingovde, İbrahim Bahattin Vidinli, Rifat Ozcan, Özgür Ulusoy
Pages: 1151-1154
doi>10.1145/2600428.2609532
Full text: PDFPDF

In this study, we analyze an educational search engine log for shedding light on K-12 students' search behavior in a learning environment. We specially focus on query, session, user and click characteristics and compare the trends to the findings in ...
expand
The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval
Fiana Raiber, Oren Kurland
Pages: 1155-1158
doi>10.1145/2600428.2609533
Full text: PDFPDF

We present a study of the correlation between the extent to which the cluster hypothesis holds, as measured by various tests, and the relative effectiveness of cluster-based retrieval with respect to document-based retrieval. We show that the correlation ...
expand
The effect of expanding relevance judgements with duplicates
Gaurav Baruah, Adam Roegiest, Mark D. Smucker
Pages: 1159-1162
doi>10.1145/2600428.2609534
Full text: PDFPDF

We examine the effects of expanding a judged set of sentences with their duplicates from a corpus. Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance ...
expand
On correlation of absence time and search effectiveness
Sunandan Chakraborty, Filip Radlinski, Milad Shokouhi, Paul Baecke
Pages: 1163-1166
doi>10.1145/2600428.2609535
Full text: PDFPDF

Online search evaluation metrics are typically derived based on implicit feedback from the users. For instance, computing the number of page clicks, number of queries, or dwell time on a search result. In a recent paper, Dupret and Lalmas introduced ...
expand
Necessary and frequent terms in queries
Jiepu Jiang, James Allan
Pages: 1167-1170
doi>10.1145/2600428.2609536
Full text: PDFPDF

Vocabulary mismatch has long been recognized as one of the major issues affecting search effectiveness. Ineffective queries usually fail to incorporate important terms and/or incorrectly include inappropriate keywords. However, in this paper we show ...
expand
Extracting topics based on authors, recipients and content in microblogs
Nazneen Fatema N. Rajani, Kate McArdle, Jason Baldridge
Pages: 1171-1174
doi>10.1145/2600428.2609537
Full text: PDFPDF

Microblogs such as Twitter are important sources for spreading vital information at high speed. They also reflect the general people's reaction and opinion towards major events or stories. With information traveling so quickly, it is helpful to be able ...
expand
Exploiting Twitter and Wikipedia for the annotation of event images
Philip James McParlane, Joemon Jose
Pages: 1175-1178
doi>10.1145/2600428.2609538
Full text: PDFPDF

With the rise in popularity of smart phones, there has been a recent increase in the number of images taken at large social (e.g. festivals) and world (e.g. natural disasters) events which are uploaded to image sharing websites such as Flickr. As with ...
expand
Learning to translate queries for CLIR
Artem Sokolov, Felix Hieber, Stefan Riezler
Pages: 1179-1182
doi>10.1145/2600428.2609539
Full text: PDFPDF

The statistical machine translation (SMT) component of cross-lingual information retrieval (CLIR) systems is often regarded as black box that is optimized for translation quality independent from the retrieval task. In recent work [10], SMT has been ...
expand
Predicting query performance in microblog retrieval
Jesus A. Rodriguez Perez, Joemon M. Jose
Pages: 1183-1186
doi>10.1145/2600428.2609540
Full text: PDFPDF

Query Performance Prediction (QPP) is the estimation of the retrieval success for a query, without explicit knowledge about relevant documents. QPP is especially interesting in the context of Automatic Query Expansion (AQE) based on Pseudo Relevance ...
expand
An event extraction model based on timeline and user analysis in Latent Dirichlet allocation
Bayar Tsolmon, Kyung-Soon Lee
Pages: 1187-1190
doi>10.1145/2600428.2609541
Full text: PDFPDF

Social media such as Twitter has come to reflect the reaction of the general public to major events. Since posts are short and noisy, it is hard to extract reliable events based on word frequency. Even though an event term appears in a particularly low ...
expand
What makes data robust: a data analysis in learning to rank
Shuzi Niu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Xiubo Geng
Pages: 1191-1194
doi>10.1145/2600428.2609542
Full text: PDFPDF

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ...
expand
Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data
Ivan Vulić, Susana Zoghbi, Marie-Francine Moens
Pages: 1195-1198
doi>10.1145/2600428.2609543
Full text: PDFPDF

We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles ...
expand
Uncovering the unarchived web
Thaer Samar, Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Arjen de Vries
Pages: 1199-1202
doi>10.1145/2600428.2609544
Full text: PDFPDF

Many national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected ...
expand
Inferring topic-dependent influence roles of Twitter users
Chengyao Chen, Dehong Gao, Wenjie Li, Yuexian Hou
Pages: 1203-1206
doi>10.1145/2600428.2609545
Full text: PDFPDF

Twitter, as one of the most popular social media platforms, provides a convenient way for people to communicate and interact with each other. It has been well recognized that influence exists during users' interactions. Some pioneer studies on finding ...
expand
Reputation analysis with a ranked sentiment-lexicon
Filipa Peleja, João Santos, João Magalhães
Pages: 1207-1210
doi>10.1145/2600428.2609546
Full text: PDFPDF

Reputation analysis is naturally linked to a sentiment analysis task of the targeted entities. This analysis leverages on a sentiment lexicon that includes general sentiment words and domain specific jargon. However, in most cases target entities are ...
expand
On predicting religion labels in microblogging networks
Minh-Thap Nguyen, Ee-Peng Lim
Pages: 1211-1214
doi>10.1145/2600428.2609547
Full text: PDFPDF

Religious belief plays an important role in how people behave, influencing how they form preferences, interpret events around them, and develop relationships with others. Traditionally, the religion labels of user population are obtained by conducting ...
expand
Efficiently identify local frequent keyword co-occurrence patterns in geo-tagged Twitter stream
Xiaoyang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin
Pages: 1215-1218
doi>10.1145/2600428.2609548
Full text: PDFPDF

With the prevalence of the geo-position enabled devices and services, a rapidly growing amount of tweets are associated with geo-tags. Consequently, the real time search on geo-tagged Twitter streams has attracted great attentions.In this paper, we advocate ...
expand
Item group based pairwise preference learning for personalized ranking
Shuang Qiu, Jian Cheng, Ting Yuan, Cong Leng, Hanqing Lu
Pages: 1219-1222
doi>10.1145/2600428.2609549
Full text: PDFPDF

Collaborative filtering with implicit feedbacks has been steadily receiving more attention, since the abundant implicit feedbacks are more easily collected while explicit feedbacks are not necessarily always available. Several recent work address this ...
expand
Where not to go?: detecting road hazards using twitter
Avinash Kumar, Miao Jiang, Yi Fang
Pages: 1223-1226
doi>10.1145/2600428.2609550
Full text: PDFPDF

Conventional approaches to road hazard detection involve manual inspections of roads by government transportation agencies. These approaches are usually expensive to execute, and sometimes are not able to capture the most recent hazards. Moreover, they ...
expand
Enhancing sketch-based sport video retrieval by suggesting relevant motion paths
Ihab Al Kabary, Heiko Schuldt
Pages: 1227-1230
doi>10.1145/2600428.2609551
Full text: PDFPDF

Searching for scenes in team sport videos is a task that recurs very often in game analysis and other related activities performed by coaches. In most cases, queries are formulated on the basis of specific motion characteristics the user remembers from ...
expand
Dynamic location models
Vanessa Murdock
Pages: 1231-1234
doi>10.1145/2600428.2609552
Full text: PDFPDF

Location models built on social media have been shown to be an important step toward understanding places in queries. Current search technology focuses on predicting broad regions such as cities. Hyperlocal scenarios are important because of the increasing ...
expand
Wikipedia-based query performance prediction
Gilad Katz, Anna Shtock, Oren Kurland, Bracha Shapira, Lior Rokach
Pages: 1235-1238
doi>10.1145/2600428.2609553
Full text: PDFPDF

The query-performance prediction task is to estimate retrieval effectiveness with no relevance judgments. Pre-retrieval prediction methods operate prior to retrieval time. Hence, these predictors are often based on analyzing the query and the corpus ...
expand
A revisit to social network-based recommender systems
Hui Li, Dingming Wu, Nikos Mamoulis
Pages: 1239-1242
doi>10.1145/2600428.2609554
Full text: PDFPDF

With the rapid expansion of online social networks, social network-based recommendation has become a meaningful and effective way of suggesting new items or activities to users. In this paper, we propose two methods to improve the performance of the ...
expand
DEMONSTRATION SESSION: Demo session
Relevation!: An open source system for information retrieval relevance assessment
Bevan Koopman, Guido Zuccon
Pages: 1243-1244
doi>10.1145/2600428.2611175
Full text: PDFPDF

Relevation! is a system for performing relevance judgements for information retrieval evaluation. Relevation! is web-based, fully configurable and expandable; it allows researchers to effectively collect assessments and additional qualitative data. The ...
expand
WenZher: comprehensive vertical search for healthcare domain
Liqiang Nie, Tao Li, Mohammad Akbari, Jialie Shen, Tat-Seng Chua
Pages: 1245-1246
doi>10.1145/2600428.2611176
Full text: PDFPDF

Online health seeking has transformed the way of health knowledge exchange and reusability. The existing general and vertical health search engines, however, just routinely return lists of matched documents or question answer (QA) pairs, which may overwhelm ...
expand
STICS: searching with strings, things, and cats
Johannes Hoffart, Dragan Milchevski, Gerhard Weikum
Pages: 1247-1248
doi>10.1145/2600428.2611177
Full text: PDFPDF

This paper describes an advanced search engine that supports users in querying documents by means of keywords, entities, and categories. Users simply type words, which are automatically mapped onto appropriate suggestions for entities and categories. ...
expand
VIRLab: a web-based virtual lab for learning and studying information retrieval models
Hui Fang, Hao Wu, Peilin Yang, ChengXiang Zhai
Pages: 1249-1250
doi>10.1145/2600428.2611178
Full text: PDFPDF

In this paper, we describe VIRLab, a novel web-based virtual laboratory for Information Retrieval (IR). Unlike existing command line based IR toolkits, the VIRLab system provides a more interactive tool that enables easy implementation of retrieval functions ...
expand
ServiceXplorer: a similarity-based web service search engine
Anne H.H. Ngu, Jiangang Ma, Quan Z. Sheng, Lina Yao, Scott Julian
Pages: 1251-1252
doi>10.1145/2600428.2611179
Full text: PDFPDF

Finding relevant Web services and composing them into value-added applications is becoming increasingly important in cloud and service based marketplaces. The key problem with current approaches to finding relevant Web services is that most of them only ...
expand
Real-time visualization and targeting of online visitors
Deepak Pai, Sandeep Zechariah George
Pages: 1253-1254
doi>10.1145/2600428.2611180
Full text: PDFPDF

Identifying and targeting visitors on an e-commerce website with personalized content in real-time is extremely important to marketers. Although such targeting exists today, it is based on demographic attributes of the visitors. We show that dynamic ...
expand
CharBoxes: a system for automatic discovery of character infoboxes from books
Manish Gupta, Piyush Bansal, Vasudeva Varma
Pages: 1255-1256
doi>10.1145/2600428.2611181
Full text: PDFPDF

Entities are centric to a large number of real world applications. Wikipedia shows entity infoboxes for a large number of entities. However, not much structured information is available about character entities in books. Automatic discovery of characters ...
expand
ADAM: a system for jointly providing ir and database queries in large-scale multimedia retrieval
Ivan Giangreco, Ihab Al Kabary, Heiko Schuldt
Pages: 1257-1258
doi>10.1145/2600428.2611182
Full text: PDFPDF

The tremendous increase of multimedia data in recent years has heightened the need for systems that not only allow to search with keywords, but that also support content-based retrieval in order to effectively and efficiently query large collections. ...
expand
NicePic!: a system for extracting attractive photos from flickr streams
Sergej Zerr, Stefan Siersdorfer, Jose San Pedro, Jonathon Hare, Xiaofei Zhu
Pages: 1259-1260
doi>10.1145/2600428.2611183
Full text: PDFPDF

A large number of images are continuously uploaded to popular photo sharing websites and online social communities. In this demonstration we show a novel application which automatically classifies images in a live photo stream according to their attractiveness ...
expand
A perspective-aware approach to search: visualizing perspectives in news search results
Muhammad Atif Qureshi, Colm O'Riordan, Gabriella Pasi
Pages: 1261-1262
doi>10.1145/2600428.2611184
Full text: PDFPDF

The result set from a search engine for any user's query may exhibit an inherent perspective due to issues with the search engine or issues with the underlying collection. This demonstration paper presents a system that allows users to specify at query ...
expand
FitYou: integrating health profiles to real-time contextual suggestion
Christopher Wing, Hui Yang
Pages: 1263-1264
doi>10.1145/2600428.2611185
Full text: PDFPDF

Obesity and its associated health consequences such as high blood pressure and cardiac disease affect a significant proportion of the world's population. At the same time, the popularity of location-based services (LBS) and recommender systems is continually ...
expand
Semantic full-text search with broccoli
Hannah Bast, Florian Bäurle, Björn Buchhold, Elmar Haußmann
Pages: 1265-1266
doi>10.1145/2600428.2611186
Full text: PDFPDF

We combine search in triple stores with full-text search into what we call \emph{semantic full-text search}. We provide a fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with ...
expand
Just-for-me: an adaptive personalization system for location-aware social music recommendation
Zhiyong Cheng, Jialie Shen, Tao Mei
Pages: 1267-1268
doi>10.1145/2600428.2611187
Full text: PDFPDF

In recent years, location-aware music recommendation is increasing in popularity, as more and more users consume music on the move. In this demonstration, we present an intelligent system, called Just-for-Me, to facilitate accurate music recommendation ...
expand
A novel system for the semi automatic annotation of event images
Philip James McParlane, Joemon Jose
Pages: 1269-1270
doi>10.1145/2600428.2611188
Full text: PDFPDF

With the rise in popularity of smart phones, taking and sharing photographs has never been more openly accessible. Further, photo sharing websites, such as Flickr, have made the distribution of photographs easy, resulting in an increase of visual content ...
expand
An interactive interface for visualizing events on Twitter
Andrew J. McMinn, Daniel Tsvetkov, Tsvetan Yordanov, Andrew Patterson, Rrobi Szk, Jesus A. Rodriguez Perez, Joemon M. Jose
Pages: 1271-1272
doi>10.1145/2600428.2611189
Full text: PDFPDF

In recent years, social media has become one of the most popular tools for discovering and following breaking news and ongoing events. However tools and interfaces have lagged behind users' expectations, with current tools making it difficult to discover ...
expand
ExperTime: tracking expertise over time
Jan Rybak, Krisztian Balog, Kjetil Nørvåg
Pages: 1273-1274
doi>10.1145/2600428.2611190
Full text: PDFPDF

This paper presents ExperTime, a web-based system for tracking expertise over time. We visualize a person's expertise profile on a timeline, where we detect and characterize changes in the focus or topics of expertise. It is possible to zoom in on a ...
expand
SESSION: Doctoral consortium
J. Shane Culpepper
Cluster links prediction for literature based discovery using latent structure and semantic features
Yakub Sebastian
Pages: 1275-1275
doi>10.1145/2600428.2610376
Full text: PDFPDF

The potential impact of a scientific article has a significant correlation with its ability to establish novel connections between pre-existing knowledge [1-2]. Discovering hidden connections between the existing scientific literature is an interesting ...
expand
Graph-based large scale RDF data compression
Wei Emma Zhang
Pages: 1276-1276
doi>10.1145/2600428.2610377
Full text: PDFPDF

We propose a two-stage lossless compression approach on large scale RDF data. Our approach exploits both Representation Compression and Component Compression techniques to support query and dynamic operations directly on the compressed data.
expand
Entity-based retrieval
Hadas Raviv
Pages: 1277-1277
doi>10.1145/2600428.2610378
Full text: PDFPDF

We address the core challenge of the entity retrieval task: ranking entities in response to a query by their presumed relevance to the information need that the query represents. As an initial research direction we explored two models for entity ranking ...
expand
Improving offline and online web search evaluation by modelling the user behaviour
Eugene Kharitonov
Pages: 1278-1278
doi>10.1145/2600428.2610379
Full text: PDFPDF

Measurements are fundamental to any empirical science and, similarly, search evaluation is a vital part of information retrieval (IR). Evaluation ensures the progressive development of approaches, tools, and methods studied in this field. Apart from ...
expand
Modelling of terms across scripts through autoencoders
Parth Gupta
Pages: 1279-1279
doi>10.1145/2600428.2610380
Full text: PDFPDF

cripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or cross-lingual space with more than one scripts which is ...
expand
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Noor Ifada
Pages: 1280-1280
doi>10.1145/2600428.2610381
Full text: PDFPDF

This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems ...
expand
Novelty and diversity enhancement and evaluation in recommender systems and information retrieval
Saúl Vargas
Pages: 1281-1281
doi>10.1145/2600428.2610382
Full text: PDFPDF

The development and evaluation of Information Retrieval and Recommender Systems has traditionally focused on the relevance and accuracy of retrieved documents and recommendations, respectively. However, there is an increasing realization that accuracy ...
expand
Enrichment of user profiles across multiple online social networks for volunteerism matching for social enterprise
Xuemeng Song
Pages: 1282-1282
doi>10.1145/2600428.2610383
Full text: PDFPDF

Volunteers are extremely crucial to nonprofit organizations (NPOs) to sustain their continuing operations. On the other hand, many talents are looking for appropriate volunteer opportunities to realize their dreams of making an impact on the world with ...
expand
TUTORIAL SESSION: Tutorials
Choices and constraints: research goals and approaches in information retrieval (part 1)
Diane Kelly, Filip Radlinski, Jaime Teevan
Pages: 1283-1283
doi>10.1145/2600428.2602289
Full text: PDFPDF

All research projects begin with a goal, for instance to describe search behavior, to predict when a person will enter a second query, or to discover which IR system performs the best. Different research goals suggest different research approaches, ranging ...
expand
Choices and constraints: research goals and approaches in information retrieval (part 2)
Diane Kelly, Filip Radlinski, Jaime Teevan
Pages: 1284-1284
doi>10.1145/2600428.2602290
Full text: PDFPDF

All research projects begin with a goal, for instance to describe search behavior, to predict when a person will enter a second query, or to discover which IR system performs the best. Different research goals suggest different research approaches, ranging ...
expand
Scalability and efficiency challenges in large-scale web search engines
B. Barla Cambazoglu, Ricardo Baeza-Yates
Pages: 1285-1285
doi>10.1145/2600428.2602291
Full text: PDFPDF

Large-scale web search engines rely on massive compute infrastructures to be able to cope with the continuous growth of the Web and their user bases. In such search engines, achieving scalability and efficiency requires making careful architectural design ...
expand
Statistical significance testing in information retrieval: theory and practice
Ben Carterette
Pages: 1286-1286
doi>10.1145/2600428.2602292
Full text: PDFPDF

The past 20 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtrieval Con- ference [2]), ...
expand
Speech search: techniques and tools for spoken content retrieval
Gareth J.F. Jones
Pages: 1287-1287
doi>10.1145/2600428.2602293
Full text: PDFPDF
Axiomatic analysis and optimization of information retrieval models
Hui Fang, ChengXiang Zhai
Pages: 1288-1288
doi>10.1145/2600428.2602294
Full text: PDFPDF

Axiomatic approach provides a systematic way to think about heuristics, identify the weakness of existing methods, and optimize the existing methods accordingly. This tutorial aims to promote axiomatic thinking that can benefit not only the study of ...
expand
A general account of effectiveness metrics for information tasks: retrieval, filtering, and clustering
Enrique Amigó, Julio Gonzalo, Stefano Mizzaro
Pages: 1289-1289
doi>10.1145/2600428.2602296
Full text: PDFPDF

In this tutorial we will present, review, and compare the most popular evaluation metrics for some of the most salient information related tasks, covering: (i) Information Retrieval, (ii) Clustering, and (iii) Filtering. The tutorial will make a special ...
expand
Dynamic information retrieval modeling
Hui Yang, Marc Sloan, Jun Wang
Pages: 1290-1290
doi>10.1145/2600428.2602297
Full text: PDFPDF

Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over ...
expand
The retrievability of documents
Leif Azzopardi
Pages: 1291-1291
doi>10.1145/2600428.2602298
Full text: PDFPDF

Retrievability is an important and interesting indicator that can be used in a number of ways to analyse Information Retrieval systems and document collections. Rather than focusing totally on relevance, retrievability examines what is retrieved, how ...
expand
WORKSHOP SESSION: Workshops
ERD'14: entity recognition and disambiguation challenge
David Carmel, Ming-Wei Chang, Evgeniy Gabrilovich, Bo-June (Paul) Hsu, Kuansan Wang
Pages: 1292-1292
doi>10.1145/2600428.2600734
Full text: PDFPDF
SIGIR 2014 workshop on gathering efficient assessments of relevance (GEAR)
Martin Halvey, Robert Villa, Paul Clough
Pages: 1293-1293
doi>10.1145/2600428.2600735
Full text: PDFPDF

Evaluation is a fundamental part of Information Retrieval, and in the conventional Cranfield evaluation paradigm, sets of relevance assessments are a fundamental part of test collections. This workshop revisits how relevance assessments can be efficiently ...
expand
MedIR14: medical information retrieval workshop
Lorraine Goeuriot, Gareth J.F. Jones, Liadh Kelly, Henning Müller, Justin Zobel
Pages: 1294-1294
doi>10.1145/2600428.2600736
Full text: PDFPDF

Medical information is accessible from diverse sources including the general web, social media, journal articles, and hospital records; information searchers can be patients and their families, researchers, practitioners and clinicians. Challenges in ...
expand
Privacy-preserving IR: when information retrieval meets privacy and security
Luo Si, Hui Yang
Pages: 1295-1295
doi>10.1145/2600428.2600737
Full text: PDFPDF

Information retrieval (IR) and information privacy/security are two fast-growing computer science disciplines. There are many synergies and connections between these two disciplines. However, there have been very limited efforts to connect the two important ...
expand
SIGIR 2014 workshop on semantic matching in information retrieval
Julio Gonzalo, Hang Li, Alessandro Moschitti, Jun Xu
Pages: 1296-1296
doi>10.1145/2600428.2600738
Full text: PDFPDF

Recently, significant progress has been made in research on what we call semantic matching (SM), in web search, question answering, online advertisement, cross-language information retrieval, and other tasks. Advanced technologies based on machine learning ...
expand
SoMeRA 2014: social media retrieval and analysis workshop
Markus Schedl, Peter Knees, Jialie Shen
Pages: 1297-1297
doi>10.1145/2600428.2600739
Full text: PDFPDF

The SoMeRA workshop targets cutting edge research from all fields of retrieval, recommendation, and browsing in social media, as well as the analysis of user's multifaceted traces therein. Submissions to the workshop cover a broad range of topics including ...
expand
SIGIR 2014 workshop on temporal, social and spatially-aware information access (#TAIA2014)
Fernando Diaz, Claudia Hauff, Vanessa Murdock, Maarten de Rijke, Milad Shokouhi
Pages: 1298-1298
doi>10.1145/2600428.2600740
Full text: PDFPDF

Powered by The ACM Guide to Computing Literature


Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Table of Contents
SESSION: Keynote address
Riding the multimedia big data wave
John R. Smith
Pages: 1-2
doi>10.1145/2484028.2494492
Full text: PDFPDF

In this talk we present a perspective across multiple industry problems, including safety and security, medical, Web, social and mobile media, and motivate the need for large-scale analysis and retrieval of multimedia data. We describe a multi-layer ...
expand
SESSION: User behaviour
Beliefs and biases in web search
Ryen White
Pages: 3-12
doi>10.1145/2484028.2484053
Full text: PDFPDF

People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers ...
expand
Improving search result summaries by using searcher behavior data
Mikhail Ageev, Dmitry Lagun, Eugene Agichtein
Pages: 13-22
doi>10.1145/2484028.2484093
Full text: PDFPDF

Query-biased search result summaries, or "snippets", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published ...
expand
How query cost affects search behavior
Leif Azzopardi, Diane Kelly, Kathy Brennan
Pages: 23-32
doi>10.1145/2484028.2484049
Full text: PDFPDF

affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects ...
expand
Search engine switching detection based on user personal preferences and behavior patterns
Denis Savenkov, Dmitry Lagun, Qiaoling Liu
Pages: 33-42
doi>10.1145/2484028.2484099
Full text: PDFPDF

Sometimes, during a search task users may switch from one search engine to another for several reasons, e.g., dissatisfaction with the current search results or desire for broader topic coverage. Detecting the fact of switching is difficult but important ...
expand
SESSION: Social media and network analysis I
Emerging topic detection for organizations from microblogs
Yan Chen, Hadi Amiri, Zhoujun Li, Tat-Seng Chua
Pages: 43-52
doi>10.1145/2484028.2484057
Full text: PDFPDF

Microblog services have emerged as an essential way to strengthen the communications among individuals and organizations. These services promote timely and active discussions and comments towards products, markets as well as public events, and have attracted ...
expand
Pseudo test collections for training and tuning microblog rankers
Richard Berendsen, Manos Tsagkias, Wouter Weerkamp, Maarten de Rijke
Pages: 53-62
doi>10.1145/2484028.2484063
Full text: PDFPDF

Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting ...
expand
Learning latent friendship propagation networks with interest awareness for link prediction
Jun Zhang, Chaokun Wang, Philip S. Yu, Jianmin Wang
Pages: 63-72
doi>10.1145/2484028.2484029
Full text: PDFPDF

It's well known that the transitivity of friendship is a popular sociological principle in social networks. However, it's still unknown that to what extent people's friend-making behaviors follow this principle and to what extent it can benefit the link ...
expand
An experimental study on implicit social recommendation
Hao Ma
Pages: 73-82
doi>10.1145/2484028.2484059
Full text: PDFPDF

Social recommendation problems have drawn a lot of attention recently due to the prevalence of social networking sites. The experiments in previous literature suggest that social information is very effective in improving traditional recommendation algorithms. ...
expand
SESSION: Queries I
Task-aware query recommendation
Henry Feild, James Allan
Pages: 83-92
doi>10.1145/2484028.2484069
Full text: PDFPDF

When generating query recommendations for a user, a natural approach is to try and leverage not only the user's most recently submitted query, or reference query, but also information about the current search context, such as the user's recent search ...
expand
Extracting query facets from search results
Weize Kong, James Allan
Pages: 93-102
doi>10.1145/2484028.2484097
Full text: PDFPDF

Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using ...
expand
Learning to personalize query auto-completion
Milad Shokouhi
Pages: 103-112
doi>10.1145/2484028.2484076
Full text: PDFPDF

Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes ...
expand
Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval
Parvaz Mahdabi, Shima Gerani, Jimmy Xiangji Huang, Fabio Crestani
Pages: 113-122
doi>10.1145/2484028.2484056
Full text: PDFPDF

Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application ...
expand
SESSION: Users and interactive IR I
Aggregated search interface preferences in multi-session search tasks
Marc Bron, Jasmijn van Gorp, Frank Nack, Lotte Belice Baltussen, Maarten de Rijke
Pages: 123-132
doi>10.1145/2484028.2484050
Full text: PDFPDF

Aggregated search interfaces provide users with an overview of results from various sources. Two general types of display exist: tabbed, with access to each source in a separate tab, and blended, which combines multiple sources into a single result page. ...
expand
An effective implicit relevance feedback technique using affective, physiological and behavioural features
Yashar Moshfeghi, Joemon M. Jose
Pages: 133-142
doi>10.1145/2484028.2484074
Full text: PDFPDF

The effectiveness of various behavioural signals for implicit relevance feedback models has been exhaustively studied. Despite the advantages of such techniques for a real time information retrieval system, most of the behavioural signals are noisy and ...
expand
How do users respond to voice input errors?: lexical and phonetic query reformulation in voice search
Jiepu Jiang, Wei Jeng, Daqing He
Pages: 143-152
doi>10.1145/2484028.2484092
Full text: PDFPDF

Voice search offers users with a new search experience: instead of typing, users can vocalize their search queries. However, due to voice input errors (such as speech recognition errors and improper system interruptions), users need to frequently reformulate ...
expand
Mining touch interaction data on mobile devices to predict web search result relevance
Qi Guo, Haojian Jin, Dmitry Lagun, Shuai Yuan, Eugene Agichtein
Pages: 153-162
doi>10.1145/2484028.2484100
Full text: PDFPDF

Fine-grained search interactions in the desktop setting, such as mouse cursor movements and scrolling, have been shown valuable for understanding user intent, attention, and their preferences for Web search results. As web search on smart phones and ...
expand
SESSION: Efficiency I
An information-theoretic account of static index pruning
Ruey-Cheng Chen, Chia-Jung Lee
Pages: 163-172
doi>10.1145/2484028.2484061
Full text: PDFPDF

In this paper, we recast static index pruning as a model induction problem under the framework of Kullback's principle of minimum cross-entropy. We show that static index pruning has an approximate analytical solution in the form of convex integer program. ...
expand
Document identifier reassignment and run-length-compressed inverted indexes for improved search performance
Diego Arroyuelo, Senén González, Mauricio Oyarzún, Victor Sepulveda
Pages: 173-182
doi>10.1145/2484028.2484079
Full text: PDFPDF

Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: the inverted indexes. Currently, inverted indexes can be represented very efficiently using index compression schemes. Recent investigations ...
expand
Fast document-at-a-time query processing using two-tier indexes
Cristian Rossi, Edleno S. de Moura, Andre L. Carvalho, Altigran S. da Silva
Pages: 183-192
doi>10.1145/2484028.2484085
Full text: PDFPDF

In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), ...
expand
Faster and smaller inverted indices with treaps
Roberto Konow, Gonzalo Navarro, Charles L.A. Clarke, Alejandro López-Ortíz
Pages: 193-202
doi>10.1145/2484028.2484088
Full text: PDFPDF

We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously ...
expand
SESSION: Topic modeling
An unsupervised topic segmentation model incorporating word order
Shoaib Jameel, Wai Lam
Pages: 203-212
doi>10.1145/2484028.2484062
Full text: PDFPDF

We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, ...
expand
Semantic hashing using tags and topic modeling
Qifan Wang, Dan Zhang, Luo Si
Pages: 213-222
doi>10.1145/2484028.2484037
Full text: PDFPDF

It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results ...
expand
Incorporating popularity in topic models for social network analysis
Youngchul Cha, Bin Bi, Chu-Cheng Hsieh, Junghoo Cho
Pages: 223-232
doi>10.1145/2484028.2484086
Full text: PDFPDF

Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear ...
expand
Topic hierarchy construction for the organization of multi-source user generated contents
Xingwei Zhu, Zhao-Yan Ming, Xiaoyan Zhu, Tat-Seng Chua
Pages: 233-242
doi>10.1145/2484028.2484032
Full text: PDFPDF

User generated contents (UGCs) carry a huge amount of high quality information. However, the information overload and diversity of UGC sources limit their potential uses. In this research, we propose a framework to organize information from multiple ...
expand
SESSION: Users and interactive IR II
Looking ahead: query preview in exploratory search
Pernilla Qvarfordt, Gene Golovchinsky, Tony Dunnigan, Elena Agapie
Pages: 243-252
doi>10.1145/2484028.2484084
Full text: PDFPDF

Exploratory search is a complex, iterative information seeking activity that involves running multiple queries and finding and examining many documents. We designed a query preview control that visualizes the distribution of newly-retrieved and re-retrieved ...
expand
News vertical search: when and what to display to users
Richard McCreadie, Craig Macdonald, Iadh Ounis
Pages: 253-262
doi>10.1145/2484028.2484080
Full text: PDFPDF

News reporting has seen a shift toward fast-paced online reporting in new sources such as social media. Web Search engines that support a news vertical have historically relied upon articles published by major newswire providers when serving news-related ...
expand
Toward self-correcting search engines: using underperforming queries to improve search
Ahmed Hassan, Ryen W. White, Yi-Min Wang
Pages: 263-272
doi>10.1145/2484028.2484043
Full text: PDFPDF

Search engines receive queries with a broad range of different search intents. However, they do not perform equally well for all queries. Understanding where search engines perform poorly is critical for improving their performance. In this paper, we ...
expand
Fighting search engine amnesia: reranking repeated results
Milad Shokouhi, Ryen W. White, Paul Bennett, Filip Radlinski
Pages: 273-282
doi>10.1145/2484028.2484075
Full text: PDFPDF

Web search engines frequently show the same documents repeatedly for different queries within the same search session, in essence forgetting when the same documents were already shown to users. Depending on previous user interaction with the repeated ...
expand
SESSION: Recommender systems
Addressing cold-start in app recommendation: latent user models constructed from twitter followers
Jovian Lin, Kazunari Sugiyama, Min-Yen Kan, Tat-Seng Chua
Pages: 283-292
doi>10.1145/2484028.2484035
Full text: PDFPDF

As a tremendous number of mobile applications (apps) are readily available, users have difficulty in identifying apps that are relevant to their interests. Recommender systems that depend on previous user ratings (i.e., collaborative filtering, or CF) ...
expand
A location-based news article recommendation with explicit localized semantic analysis
Jeong-Woo Son, A-Yeong Kim, Seong-Bae Park
Pages: 293-302
doi>10.1145/2484028.2484064
Full text: PDFPDF

The interest of users in handheld devices is strongly related to their location. Therefore, the user location is important, as a user context, for news article recommendation in a mobile environment. This paper proposes a novel news article recommendation ...
expand
Opportunity model for e-commerce recommendation: right product; right time
Jian Wang, Yi Zhang
Pages: 303-312
doi>10.1145/2484028.2484067
Full text: PDFPDF

Most of existing e-commerce recommender systems aim to recommend the right product to a user, based on whether the user is likely to purchase or like a product. On the other hand, the effectiveness of recommendations also depends on the time of the recommendation. ...
expand
Improve collaborative filtering through bordered block diagonal form matrices
Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma
Pages: 313-322
doi>10.1145/2484028.2484101
Full text: PDFPDF

Collaborative Filtering-based recommendation algorithms have achieved widespread success on the Web, but little work has been performed to investigate appropriate user-item relationship structures of rating matrices. This paper presents a novel and general ...
expand
SESSION: Retrieval models and ranking I
Personalized ranking model adaptation for web search
Hongning Wang, Xiaodong He, Ming-Wei Chang, Yang Song, Ryen W. White, Wei Chu
Pages: 323-332
doi>10.1145/2484028.2484068
Full text: PDFPDF

Search engines train and apply a single ranking model across all users, but searchers' information needs are diverse and cover a broad range of topics. Hence, a single user-independent ranking model is insufficient to satisfy different users' result ...
expand
Ranking document clusters using markov random fields
Fiana Raiber, Oren Kurland
Pages: 333-342
doi>10.1145/2484028.2484042
Full text: PDFPDF

An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types ...
expand
A novel TF-IDF weighting scheme for effective ranking
Jiaul H. Paik
Pages: 343-352
doi>10.1145/2484028.2484070
Full text: PDFPDF

Term weighting schemes are central to the study of information retrieval systems. This article proposes a novel TF-IDF term weighting scheme that employs two different within document term frequency normalizations to capture two different aspects of ...
expand
Retrieving documents with mathematical content
Shahab Kamali, Frank Wm. Tompa
Pages: 353-362
doi>10.1145/2484028.2484083
Full text: PDFPDF

Many documents with mathematical content are published on the Web, but conventional search engines that rely on keyword search only cannot fully exploit their mathematical information. In particular, keyword search is insufficient when expressions in ...
expand
SESSION: Time
Time-aware point-of-interest recommendation
Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, Nadia Magnenat- Thalmann
Pages: 363-372
doi>10.1145/2484028.2484030
Full text: PDFPDF

The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to ...
expand
Modeling user's receptiveness over time for recommendation
Wei Chen, Wynne Hsu, Mong Li Lee
Pages: 373-382
doi>10.1145/2484028.2484047
Full text: PDFPDF

Existing recommender systems model user interests and the social influences independently. In reality, user interests may change over time, and as the interests change, new friends may be added while old friends grow apart and the new friendships formed ...
expand
Query representation for cross-temporal information retrieval
Miles Efron
Pages: 383-392
doi>10.1145/2484028.2484054
Full text: PDFPDF

This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to ...
expand
SESSION: Evaluation I
On the measurement of test collection reliability
Julián Urbano, Mónica Marrero, Diego Martín
Pages: 393-402
doi>10.1145/2484028.2484038
Full text: PDFPDF

The reliability of a test collection is proportional to the number of queries it contains. But building a collection with many queries is expensive, so researchers have to find a balance between reliability and cost. Previous work on the measurement ...
expand
Deciding on an adjustment for multiplicity in IR experiments
Leonid Boytsov, Anna Belova, Peter Westfall
Pages: 403-412
doi>10.1145/2484028.2484034
Full text: PDFPDF

We evaluate statistical inference procedures for small-scale IR experiments that involve multiple comparisons against the baseline. These procedures adjust for multiple comparisons by ensuring that the probability of observing at least one false positive ...
expand
Preference based evaluation measures for novelty and diversity
Praveen Chandar, Ben Carterette
Pages: 413-422
doi>10.1145/2484028.2484094
Full text: PDFPDF

Novel and diverse document ranking is an effective strategy that involves reducing redundancy in a ranked list to maximize the amount of novel and relevant information available to users. Evaluation for novelty and diversity typically involves an assessor ...
expand
SESSION: Multimedia
Competence-based song recommendation
Lidan Shou, Kuang Mao, Xinyuan Luo, Ke Chen, Gang Chen, Tianlei Hu
Pages: 423-432
doi>10.1145/2484028.2484048
Full text: PDFPDF

Singing is a popular social activity and a good way of expressing one's feelings. One important reason for unsuccessful singing performance is because the singer fails to choose a suitable song. In this paper, we propose a novel singing competence-based ...
expand
A low rank structural large margin method for cross-modal ranking
Xinyan Lu, Fei Wu, Siliang Tang, Zhongfei Zhang, Xiaofei He, Yueting Zhuang
Pages: 433-442
doi>10.1145/2484028.2484039
Full text: PDFPDF

Cross-modal retrieval is a classic research topic in multimedia information retrieval. The traditional approaches study the problem as a pairwise similarity function problem. In this paper, we consider this problem from a new perspective as a listwise ...
expand
Learning to name faces: a multimodal learning scheme for search-based face annotation
Dayong Wang, Steven C.H. Hoi, Pengcheng Wu, Jianke Zhu, Ying He, Chunyan Miao
Pages: 443-452
doi>10.1145/2484028.2484040
Full text: PDFPDF

Automated face annotation aims to automatically detect human faces from a photo and further name the faces with the corresponding human names. In this paper, we tackle this open problem by investigating a search-based face annotation (SBFA) paradigm ...
expand
SESSION: Search sessions
Utilizing query change for session search
Dongyi Guan, Sicong Zhang, Hui Yang
Pages: 453-462
doi>10.1145/2484028.2484055
Full text: PDFPDF

Session search is the Information Retrieval (IR) task that performs document retrieval for a search session. During a session, a user constantly modifies queries in order to find relevant documents that fulfill the information need. This paper proposes ...
expand
Toward whole-session relevance: exploring intrinsic diversity in web search
Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson
Pages: 463-472
doi>10.1145/2484028.2484089
Full text: PDFPDF

Current research on web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [20] which span multiple queries across one or more search sessions [26,24]. An ideal ...
expand
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation
Tetsuya Sakai, Zhicheng Dou
Pages: 473-482
doi>10.1145/2484028.2484031
Full text: PDFPDF

We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all ...
expand
SESSION: Click models
Modeling click-through based word-pairs for web search
Jagadeesh Jagarlamudi, Jianfeng Gao
Pages: 483-492
doi>10.1145/2484028.2484082
Full text: PDFPDF

Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the ...
expand
Click model-based information retrieval metrics
Aleksandr Chuklin, Pavel Serdyukov, Maarten de Rijke
Pages: 493-502
doi>10.1145/2484028.2484071
Full text: PDFPDF

In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together ...
expand
Incorporating vertical results into search click models
Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang
Pages: 503-512
doi>10.1145/2484028.2484036
Full text: PDFPDF

In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback ...
expand
SESSION: Social media and network analysis II
Personalized time-aware tweets summarization
Zhaochun Ren, Shangsong Liang, Edgar Meij, Maarten de Rijke
Pages: 513-522
doi>10.1145/2484028.2484052
Full text: PDFPDF

We focus on the problem of selecting meaningful tweets given a user's interests; the dynamic nature of user interests, the sheer volume, and the sparseness of individual messages make this an challenging problem. Specifically, we consider the task of ...
expand
Exploiting hybrid contexts for Tweet segmentation
Chenliang Li, Aixin Sun, Jianshu Weng, Qi He
Pages: 523-532
doi>10.1145/2484028.2484044
Full text: PDFPDF

Twitter has attracted hundred millions of users to share and disseminate most up-to-date information. However, the noisy and short nature of tweets makes many applications in information retrieval (IR) and natural language processing (NLP) challenging. ...
expand
Sumblr: continuous summarization of evolving tweet streams
Lidan Shou, Zhenhua Wang, Ke Chen, Gang Chen
Pages: 533-542
doi>10.1145/2484028.2484045
Full text: PDFPDF

With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and ...
expand
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow
Daniel Hasan Dalip, Marcos André Gonçalves, Marco Cristo, Pavel Calado
Pages: 543-552
doi>10.1145/2484028.2484072
Full text: PDFPDF

Collaborative web sites, such as collaborative encyclopedias, blogs, and forums, are characterized by a loose edit control, which allows anyone to freely edit their content. As a consequence, the quality of this content raises much concern. To deal with ...
expand
SESSION: Queries II
An incremental approach to efficient pseudo-relevance feedback
Hao Wu, Hui Fang
Pages: 553-562
doi>10.1145/2484028.2484051
Full text: PDFPDF

Pseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is ...
expand
Query expansion using path-constrained random walks
Jianfeng Gao, Gu Xu, Jinxi Xu
Pages: 563-572
doi>10.1145/2484028.2484058
Full text: PDFPDF

This paper exploits Web search logs for query expansion (QE) by presenting a new QE method based on path-constrained random walks (PCRW), where the search logs are represented as a labeled, directed graph, and the probability of picking an expansion ...
expand
Efficient query construction for large scale data
Elena Demidova, Xuan Zhou, Wolfgang Nejdl
Pages: 573-582
doi>10.1145/2484028.2484078
Full text: PDFPDF

In recent years, a number of open databases have emerged on the Web, providing Web users with platforms to collaboratively create structured information. As these databases are intended to accommodate heterogeneous information and knowledge, they usually ...
expand
Compact query term selection using topically related text
K. Tamsin Maxwell, W. Bruce Croft
Pages: 583-592
doi>10.1145/2484028.2484096
Full text: PDFPDF

Many recent and highly effective retrieval models for long queries use query reformulation methods that jointly optimize term weights and term selection. These methods learn using word context and global context but typically fail to capture query context. ...
expand
SESSION: Diversity
Sentiment diversification with different biases
Elif Aktolga, James Allan
Pages: 593-602
doi>10.1145/2484028.2484060
Full text: PDFPDF

Prior search result diversification work focuses on achieving topical variety in a ranked list, typically equally across all aspects. In this paper, we diversify with sentiments according to an explicit bias. We want to allow users to switch the result ...
expand
Term level search result diversification
Van Dang, Bruce W. Croft
Pages: 603-612
doi>10.1145/2484028.2484095
Full text: PDFPDF

Current approaches for search result diversification have been categorized as either implicit or explicit. The implicit approach assumes each document represents its own topic, and promotes diversity by selecting documents for different topics based ...
expand
Search result diversification in resource selection for federated search
Dzung Hong, Luo Si
Pages: 613-622
doi>10.1145/2484028.2484091
Full text: PDFPDF

Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect ...
expand
SESSION: Evaluation II
The effect of threshold priming and need for cognition on relevance calibration and assessment
Falk Scholer, Diane Kelly, Wan-Ching Wu, Hanseul S. Lee, William Webber
Pages: 623-632
doi>10.1145/2484028.2484090
Full text: PDFPDF

Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. ...
expand
User model-based metrics for offline query suggestion evaluation
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis
Pages: 633-642
doi>10.1145/2484028.2484041
Full text: PDFPDF

Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible ...
expand
A general evaluation measure for document organization tasks
Enrique Amigó, Julio Gonzalo, Felisa Verdejo
Pages: 643-652
doi>10.1145/2484028.2484081
Full text: PDFPDF

A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between ...
expand
SESSION: Retrieval models and ranking II
Modeling term dependencies with quantum language models for IR
Alessandro Sordoni, Jian-Yun Nie, Yoshua Bengio
Pages: 653-662
doi>10.1145/2484028.2484098
Full text: PDFPDF

Traditional information retrieval (IR) models use bag-of-words as the basic representation and assume that some form of independence holds between terms. Representing term dependencies and defining a scoring function capable of integrating such additional ...
expand
Copulas for information retrieval
Carsten Eickhoff, Arjen P. de Vries, Kevyn Collins-Thompson
Pages: 663-672
doi>10.1145/2484028.2484066
Full text: PDFPDF

In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent ...
expand
Taily: shard selection using the tail of score distributions
Robin Aly, Djoerd Hiemstra, Thomas Demeester
Pages: 673-682
doi>10.1145/2484028.2484033
Full text: PDFPDF

Search engines can improve their efficiency by selecting only few promising shards for each query. State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards. ...
expand
A mutual information-based framework for the analysis of information retrieval systems
Peter B. Golbus, Javed A. Aslam
Pages: 683-692
doi>10.1145/2484028.2484073
Full text: PDFPDF

We consider the problem of information retrieval evaluation and the methods and metrics used for such evaluations. We propose a probabilistic framework for evaluation which we use to develop new information-theoretic evaluation metrics. We demonstrate ...
expand
SESSION: Efficiency II
The impact of solid state drive on search engine cache management
Jianguo Wang, Eric Lo, Man Lung Yiu, Jiancong Tong, Gang Wang, Xiaoguang Liu
Pages: 693-702
doi>10.1145/2484028.2484046
Full text: PDFPDF

Caching is an important optimization in search engine architectures. Existing caching techniques for search engine optimization are mostly biased towards the reduction of random accesses to disks, because random accesses are known to be much more expensive ...
expand
Faster upper bounding of intersection sizes
Daisuke Takuma, Hiroki Yanagisawa
Pages: 703-712
doi>10.1145/2484028.2484065
Full text: PDFPDF

There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute ...
expand
Cache-conscious performance optimization for similarity search
Maha Alabduljalil, Xun Tang, Tao Yang
Pages: 713-722
doi>10.1145/2484028.2484077
Full text: PDFPDF

All-pairs similarity search can be implemented in two stages. The first stage is to partition the data and group potentially similar vectors. The second stage is to run a set of tasks where each task compares a partition of vectors with other candidate ...
expand
A candidate filtering mechanism for fast top-k query processing on modern cpus
Constantinos Dimopoulos, Sergey Nepomnyachiy, Torsten Suel
Pages: 723-732
doi>10.1145/2484028.2484087
Full text: PDFPDF

A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that ...
expand
SESSION: Short Papers 1 -- evaluation
A test collection for entity search in DBpedia
Krisztian Balog, Robert Neumayer
Pages: 737-740
doi>10.1145/2484028.2484165
Full text: PDFPDF

We develop and make publicly available an entity search test collection based on the DBpedia knowledge base. This includes a large number of queries and corresponding relevance judgments from previous benchmarking campaigns, covering a broad range of ...
expand
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion
Lei Cen, Eduard C. Dragut, Luo Si, Mourad Ouzzani
Pages: 741-744
doi>10.1145/2484028.2484157
Full text: PDFPDF

Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first ...
expand
Document features predicting assessor disagreement
Praveen Chandar, William Webber, Ben Carterette
Pages: 745-748
doi>10.1145/2484028.2484161
Full text: PDFPDF

The notion of relevance differs between assessors, thus giving rise to assessor disagreement. Although assessor disagreement has been frequently observed, the factors leading to disagreement are still an open problem. In this paper we study the relationship ...
expand
Exploring semi-automatic nugget extraction for Japanese one click access evaluation
Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata
Pages: 749-752
doi>10.1145/2484028.2484153
Full text: PDFPDF

Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and ...
expand
Report from the NTCIR-10 1CLICK-2 Japanese subtask: baselines, upperbounds and evaluation robustness
Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata
Pages: 753-756
doi>10.1145/2484028.2484117
Full text: PDFPDF

The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units ...
expand
Building a web test collection using social media
Chia-Jung Lee, W. Bruce Croft
Pages: 757-760
doi>10.1145/2484028.2484139
Full text: PDFPDF

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test collection ...
expand
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Makoto P. Kato, Ruihua Song, Mayu Iwata
Pages: 761-764
doi>10.1145/2484028.2484104
Full text: PDFPDF

The NTCIR INTENT task comprises two subtasks: {\em Subtopic Mining}, where systems are required to return a ranked list of {\em subtopic strings} for each given query; and {\em Document Ranking}, where systems are required to return a diversified web ...
expand
Is relevance hard work?: evaluating the effort of making relevant assessments
Robert Villa, Martin Halvey
Pages: 765-768
doi>10.1145/2484028.2484150
Full text: PDFPDF

The judging of relevance has been a subject of study in information retrieval for a long time, especially in the creation of relevance judgments for test collections. While the criteria by which assessors? judge relevance has been intensively studied, ...
expand
SESSION: Short papers 1 -- filtering and recommending
A weakly-supervised detection of entity central documents in a stream
Ludovic Bonnefoy, Vincent Bouvier, Patrice Bellot
Pages: 769-772
doi>10.1145/2484028.2484180
Full text: PDFPDF

Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed ...
expand
Sentiment analysis of user comments for one-class collaborative filtering over ted talks
Nikolaos Pappas, Andrei Popescu-Belis
Pages: 773-776
doi>10.1145/2484028.2484116
Full text: PDFPDF

User-generated texts such as reviews, comments or discussions are valuable indicators of users' preferences. Unlike previous works which focus on labeled data from user-contributed reviews, we focus here on user comments which are not accompanied by ...
expand
Modeling the uniqueness of the user preferences for recommendation systems
Haggai Roitman, David Carmel, Yosi Mass, Iris Eiron
Pages: 777-780
doi>10.1145/2484028.2484102
Full text: PDFPDF

In this paper we propose a novel framework for modeling the uniqueness of the user preferences for recommendation systems. User uniqueness is determined by learning to what extent the user's item preferences deviate from those of an "average user" in ...
expand
Recommending personalized touristic sights using google places
Maya Sappelli, Suzan Verberne, Wessel Kraaij
Pages: 781-784
doi>10.1145/2484028.2484155
Full text: PDFPDF

The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations ...
expand
Optimizing top-n collaborative filtering via dynamic negative item sampling
Weinan Zhang, Tianqi Chen, Jun Wang, Yong Yu
Pages: 785-788
doi>10.1145/2484028.2484126
Full text: PDFPDF

Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user ...
expand
SESSION: Short papers 1 -- multimedia IR
Towards retrieving relevant information graphics
Zhuo Li, Matthew Stagitis, Sandra Carberry, Kathleen F. McCoy
Pages: 789-792
doi>10.1145/2484028.2484164
Full text: PDFPDF

Information retrieval research has made significant progress in the retrieval of text documents and images. However, relatively little attention has been given to the retrieval of information graphics (non-pictorial images such as bar charts and line ...
expand
Hybrid retrieval approaches to geospatial music recommendation
Markus Schedl, Dominik Schnitzer
Pages: 793-796
doi>10.1145/2484028.2484146
Full text: PDFPDF

Recent advances in music retrieval and recommendation algorithms highlight the necessity to follow multimodal approaches in order to transcend limits imposed by methods that solely use audio, web, or collaborative filtering data. In this paper, we propose ...
expand
Leveraging viewer comments for mood classification of music video clips
Takehiro Yamamoto, Satoshi Nakamura
Pages: 797-800
doi>10.1145/2484028.2484118
Full text: PDFPDF

This short paper proposes a method to classify music video clips uploaded to a video sharing service into music mood categories such as 'cheerful,' 'wistful,' and 'aggressive.' The method leverages viewer comments posted to the music video clips for ...
expand
SESSION: Short papers 1 -- queries and query analysis
Exploiting semantics for improving clinical information retrieval
Atanaz Babashzadeh, Jimmy Huang, Mariam Daoud
Pages: 801-804
doi>10.1145/2484028.2484167
Full text: PDFPDF

Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and go beyond keywords matching. ...
expand
Interpretation of coordinations, compound generation, and result fusion for query variants
Johannes Leveling
Pages: 805-808
doi>10.1145/2484028.2484115
Full text: PDFPDF

We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. ...
expand
Time-aware structured query suggestion
Taiki Miyanishi, Tetsuya Sakai
Pages: 809-812
doi>10.1145/2484028.2484143
Full text: PDFPDF

Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at ...
expand
Flat vs. hierarchical phrase-based translation models for cross-language information retrieval
Ferhan Ture, Jimmy Lin
Pages: 813-816
doi>10.1145/2484028.2484137
Full text: PDFPDF

Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements ...
expand
Here and there: goals, activities, and predictions about location from geotagged queries
Robert West, Ryen W. White, Eric Horvitz
Pages: 817-820
doi>10.1145/2484028.2484125
Full text: PDFPDF

A significant portion of Web search is performed in mobile settings. We explore the links between users' queries on mobile devices and their locations and movement, with a focus on interpreting queries about addresses. We find that users tend to have ...
expand
Query change as relevance feedback in session search
Sicong Zhang, Dongyi Guan, Hui Yang
Pages: 821-824
doi>10.1145/2484028.2484171
Full text: PDFPDF

Session search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change ...
expand
SESSION: Short papers 1 -- retrieval models and ranking
Is uncertain logical-matching equivalent to conditional probability?
Karam Abdulahhad, Jean-Pierre Chevallet, Catherine Berrut
Pages: 825-828
doi>10.1145/2484028.2484152
Full text: PDFPDF

Logic-based Information Retrieval (IR) models represent the retrieval decision as a logical implication d->q between a document d and a query q, where d and q are logical sentences. However, d->q is a binary decision, we thus need a measure to ...
expand
Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis
Xiangdong An, Jimmy Xiangji Huang
Pages: 829-832
doi>10.1145/2484028.2484174
Full text: PDFPDF

In information retrieval, we are interested in the information that is not only relevant but also novel. In this paper, we study how to boost novelty for biomedical information retrieval through probabilistic latent semantic analysis. We conduct the ...
expand
Learning to combine representations for medical records search
Nut Limsopatham, Craig Macdonald, Iadh Ounis
Pages: 833-836
doi>10.1145/2484028.2484177
Full text: PDFPDF

The complexity of medical terminology raises challenges when searching medical records. For example, 'cancer', 'tumour', and 'neoplasms', which are synonyms, may prevent a traditional search system from retrieving relevant records that contain only synonyms ...
expand
Kinship contextualization: utilizing the preceding and following structural elements
Muhammad A. Norozi, Paavo Arvola
Pages: 837-840
doi>10.1145/2484028.2484111
Full text: PDFPDF

The textual context of an element, structurally, contains traces of evidences. Utilizing this context in scoring is called contextualization. In this study we hypothesize that the context of an XML-element originated from its \textit{preceding} ...
expand
The cluster hypothesis for entity oriented search
Hadas Raviv, Oren Kurland, David Carmel
Pages: 841-844
doi>10.1145/2484028.2484128
Full text: PDFPDF

In this work we study the cluster hypothesis for entity oriented search (EOS). Specifically, we show that the hypothesis can hold to a substantial extent for several entity similarity measures. We also demonstrate the retrieval effectiveness merits of ...
expand
Self reinforcement for important passage retrieval
Ricardo Ribeiro, Luís Marujo, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell
Pages: 845-848
doi>10.1145/2484028.2484134
Full text: PDFPDF

In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models ...
expand
What can pictures tell us about web pages?: improving document search using images
Sergio Rodriguez-Vaamonde, Lorenzo Torresani, Andrew Fitzgibbon
Pages: 849-852
doi>10.1145/2484028.2484144
Full text: PDFPDF

Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion ...
expand
Estimating query representativeness for query-performance prediction
Mor Sondak, Anna Shtok, Oren Kurland
Pages: 853-856
doi>10.1145/2484028.2484107
Full text: PDFPDF

The query-performance prediction (QPP) task is estimating retrieval effectiveness with no relevance judgments. We present a novel probabilistic framework for QPP that gives rise to an important aspect that was not addressed in previous work; namely, ...
expand
Interoperability ranking for mobile applications
Dragomir Yankov, Pavel Berkhin, Rajen Subba
Pages: 857-860
doi>10.1145/2484028.2484122
Full text: PDFPDF

At present, most major app marketplaces perform ranking and recommendation based on search relevance features or marketplace ``popularity'' statistics. For instance, they check similarity between app descriptions and user search queries, or rank-order ...
expand
SESSION: Short papers 1 -- social media IR
Sopra: a new social personalized ranking function for improving web search
Mohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub
Pages: 861-864
doi>10.1145/2484028.2484131
Full text: PDFPDF

We present in this paper a contribution to IR modeling by proposing a new ranking function called SoPRa that considers the social dimension of the Web. This social dimension is any social information that surrounds documents along with the social context ...
expand
Browse with a social web directory
Hao Huang, Yunjun Gao, Lu Chen, Rui Li, Kevin Chiew, Qinming He
Pages: 865-868
doi>10.1145/2484028.2484141
Full text: PDFPDF

Browse with either web directories or social bookmarks is an important complementation to search by keywords in web information retrieval. To improve users' browse experiences and facilitate the web directory construction, in this paper, we propose a ...
expand
Who will retweet me?: finding retweeters in twitter
Zhunchen Luo, Miles Osborne, Jintao Tang, Ting Wang
Pages: 869-872
doi>10.1145/2484028.2484158
Full text: PDFPDF

An important aspect of communication in Twitter (and other Social Network is message propagation -- people creating posts for others to share. Although there has been work on modelling how tweets in Twitter are propagated (retweeted), an untackled problem ...
expand
A financial cost metric for result caching
Fethi Burak Sazoglu, B. Barla Cambazoglu, Rifat Ozcan, Ismail Sengor Altingovde, Özgür Ulusoy
Pages: 873-876
doi>10.1145/2484028.2484182
Full text: PDFPDF

Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating ...
expand
SESSION: Short papers 1 -- topic models
Document classification by topic labeling
Swapnil Hingmire, Sandeep Chougule, Girish K. Palshikar, Sutanu Chakraborti
Pages: 877-880
doi>10.1145/2484028.2484140
Full text: PDFPDF

In this paper, we propose Latent Dirichlet Allocation (LDA) [1] based document classification algorithm which does not require any labeled dataset. In our algorithm, we construct a topic model using LDA, assign one topic to one of the class labels, aggregate ...
expand
Mining web search topics with diverse spatiotemporal patterns
Di Jiang, Wilfred Ng
Pages: 881-884
doi>10.1145/2484028.2484124
Full text: PDFPDF

Mining the latent topics from web search data and capturing their spatiotemporal patterns have many applications in information retrieval. As web search is heavily influenced by the spatial and temporal factors, the latent topics usually demonstrate ...
expand
A novel topic model for automatic term extraction
Sujian Li, Jiwei Li, Tao Song, Wenjie Li, Baobao Chang
Pages: 885-888
doi>10.1145/2484028.2484106
Full text: PDFPDF

Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency ...
expand
Improving LDA topic models for microblogs via tweet pooling and automatic labeling
Rishabh Mehrotra, Scott Sanner, Wray Buntine, Lexing Xie
Pages: 889-892
doi>10.1145/2484028.2484166
Full text: PDFPDF

Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic ...
expand
SESSION: Short papers 1 -- users and interactive IR
Extractive summarisation via sentence removal: condensing relevant sentences into a short summary
Marco Bonzanini, Miguel Martinez-Alvarez, Thomas Roelleke
Pages: 893-896
doi>10.1145/2484028.2484149
Full text: PDFPDF

Many on-line services allow users to describe their opinions about a product or a service through a review. In order to help other users to find out the major opinion about a given topic, without the effort to read several reviews, multi-document summarisation ...
expand
Characterizing stages of a multi-session complex search task through direct and indirect query modifications
Jiyin He, Marc Bron, Arjen P. de Vries
Pages: 897-900
doi>10.1145/2484028.2484178
Full text: PDFPDF

Search systems use context to effectively satisfy a user's information need as expressed by a query. Tasks are important factors in determining user context during search and many studies have been conducted that identify tasks and task stages through ...
expand
Displaying relevance scores for search results
Guy Shani, Noam Tractinsky
Pages: 901-904
doi>10.1145/2484028.2484112
Full text: PDFPDF

Internet search engines typically compute a relevance score for webpages given the query terms, and then rank the pages by decreasing relevance scores. The popular search engines do not, however, present the relevance scores that were computed during ...
expand
Studying page life patterns in dynamical web
Alexey Tikhonov, Ivan Bogatyy, Pavel Burangulov, Liudmila Ostroumova, Vitaliy Koshelev, Gleb Gusev
Pages: 905-908
doi>10.1145/2484028.2484185
Full text: PDFPDF

With the ever-increasing speed of content turnover on the web, it is particularly important to understand the patterns that pages' popularity follows. This paper focuses on the dynamical part of the web, i.e. pages that have a limited lifespan and experience ...
expand
SESSION: Short papers 2 -- evaluation
A document rating system for preference judgements
Maryam Bashir, Jesse Anderton, Jie Wu, Peter B. Golbus, Virgil Pavlu, Javed A. Aslam
Pages: 909-912
doi>10.1145/2484028.2484170
Full text: PDFPDF

High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by factors ...
expand
Relevance dimensions in preference-based IR evaluation
Jinyoung Kim, Gabriella Kazai, Imed Zitouni
Pages: 913-916
doi>10.1145/2484028.2484168
Full text: PDFPDF

Evaluation of information retrieval (IR) systems has recently been exploring the use of preference judgments over two search result lists. Unlike the traditional method of collecting relevance labels per single result, this method allows to consider ...
expand
Composition of TF normalizations: new insights on scoring functions for ad hoc IR
François Rousseau, Michalis Vazirgiannis
Pages: 917-920
doi>10.1145/2484028.2484121
Full text: PDFPDF

Previous papers in ad hoc IR reported that scoring functions should satisfy a set of heuristic retrieval constraints, providing a mathematical justification for the normalizations historically applied to the term frequency (TF). In this paper, we propose ...
expand
The impact of intent selection on diversified search evaluation
Tetsuya Sakai, Zhicheng Dou, Charles L.A. Clarke
Pages: 921-924
doi>10.1145/2484028.2484105
Full text: PDFPDF

To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, subtopics ...
expand
A comparison of the optimality of statistical significance tests for information retrieval evaluation
Julián Urbano, Mónica Marrero, Diego Martín
Pages: 925-928
doi>10.1145/2484028.2484163
Full text: PDFPDF

Previous research has suggested the permutation test as the theoretically optimal statistical significance test for IR evaluation, and advocated for the discontinuation of the Wilcoxon and sign tests. We present a large-scale study comprising nearly ...
expand
Assessor disagreement and text classifier accuracy
William Webber, Jeremy Pickens
Pages: 929-932
doi>10.1145/2484028.2484156
Full text: PDFPDF

Text classifiers are frequently used for high-yield retrieval from large corpora, such as in e-discovery. The classifier is trained by annotating example documents for relevance. These examples may, however, be assessed by people other than those whose ...
expand
Sequential testing in classifier evaluation yields biased estimates of effectiveness
William Webber, Mossaab Bagdouri, David D. Lewis, Douglas W. Oard
Pages: 933-936
doi>10.1145/2484028.2484159
Full text: PDFPDF

It is common to develop and validate classifiers through a process of repeated testing, with nested training and/or test sets of increasing size. We demonstrate in this paper that such repeated testing leads to biased estimates of classifier effectiveness. ...
expand
Relating retrievability, performance and length
Colin Wilkie, Leif Azzopardi
Pages: 937-940
doi>10.1145/2484028.2484145
Full text: PDFPDF

Retrievability provides a different way to evaluate an Information Retrieval (IR) system as it focuses on how easily documents can be found. It is intrinsically related to retrieval performance because a document needs to be retrieved before it can be ...
expand
SESSION: Short papers 2 -- filtering and recommending
Cumulative citation recommendation: classification vs. ranking
Krisztian Balog, Heri Ramampiaro
Pages: 941-944
doi>10.1145/2484028.2484151
Full text: PDFPDF

Cumulative citation recommendation refers to the task of filtering a time-ordered corpus for documents that are highly relevant to a predefined set of entities. This task has been introduced at the TREC Knowledge Base Acceleration track in 2012, where ...
expand
Tagcloud-based explanation with feedback for recommender systems
Wei Chen, Wynne Hsu, Mong Li Lee
Pages: 945-948
doi>10.1145/2484028.2484108
Full text: PDFPDF

Personalized recommender systems aim to push only the relevant items and information directly to the users without requiring them to browse through millions of web resources. The challenge of these systems is to achieve a high user acceptance rate on ...
expand
Collaborative factorization for recommender systems
Chaosheng Fan, Yanyan Lan, Jiafeng Guo, Zuoquan Lin, Xueqi Cheng
Pages: 949-953
doi>10.1145/2484028.2484176
Full text: PDFPDF

Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) ...
expand
RecSys for distributed events: investigating the influence of recommendations on visitor plans
Richard Schaller, Morgan Harvey, David Elsweiler
Pages: 953-956
doi>10.1145/2484028.2484119
Full text: PDFPDF

Distributed events are collections of events taking place within a small area over the same time period and relating to a single topic. There are often a large number of events on offer and the times in which they can be visited are heavily constrained, ...
expand
SESSION: Short papers 2 -- multimedia IR
Ranking-oriented nearest-neighbor based method for automatic image annotation
Chaoran Cui, Jun Ma, Tao Lian, Xiaofang Wang, Zhaochun Ren
Pages: 957-960
doi>10.1145/2484028.2484113
Full text: PDFPDF

Automatic image annotation plays a critical role in keyword-based image retrieval systems. Recently, the nearest-neighbor based scheme has been proposed and achieved good performance for image annotation. Given a new image, the scheme is to first find ...
expand
Linking transcribed conversational speech
Joseph Malionek, Douglas W. Oard, Abhijeet Sangwan, John H.L. Hansen
Pages: 961-964
doi>10.1145/2484028.2484136
Full text: PDFPDF

As large collections of historically significant recorded speech become increasingly available, scholars are faced with the challenge of making sense of what they hear. This paper proposes automatically linking conversational speech to related resources ...
expand
On contextual photo tag recommendation
Philip J. McParlane, Yashar Moshfeghi, Joemon M. Jose
Pages: 965-968
doi>10.1145/2484028.2484160
Full text: PDFPDF

Image tagging is a growing application on social media websites, however, the performance of many auto-tagging methods are often poor. Recent work has exploited an image's context (e.g. time and location) in the tag recommendation process, where tags ...
expand
The knowing camera: recognizing places-of-interest in smartphone photos
Pai Peng, Lidan Shou, Ke Chen, Gang Chen, Sai Wu
Pages: 969-972
doi>10.1145/2484028.2484173
Full text: PDFPDF

This paper presents a framework called Knowing Camera for real-time recognizing places-of-interest in smartphone photos, with the availability of online geotagged images of such places. We propose a probabilistic field-of-view model which captures the ...
expand
SESSION: Short papers 2 -- queries and query analysis
Question retrieval with user intent
Long Chen, Dell Zhang, Mark Levene
Pages: 973-976
doi>10.1145/2484028.2484129
Full text: PDFPDF

Community Question Answering (CQA) services, such as Yahoo! Answers and WikiAnswers, have become popular with users as one of the central paradigms for satisfying users' information needs. The task of question retrieval in CQA aims to resolve one's query ...
expand
Mapping queries to questions: towards understanding users' information needs
Yunjun Gao, Lu Chen, Rui Li, Gang Chen
Pages: 977-980
doi>10.1145/2484028.2484138
Full text: PDFPDF

In this paper, for the first time, we study the problem of mapping keyword queries to questions on community-based question answering (CQA) sites. Mapping general web queries to questions enables search engines not only to discover explicit and specific ...
expand
From keywords to keyqueries: content descriptors for the web
Tim Gollub, Matthias Hagen, Maximilian Michel, Benno Stein
Pages: 981-984
doi>10.1145/2484028.2484181
Full text: PDFPDF

We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are the minimal queries that return the ...
expand
Commodity query by snapping
Hao Huang, Yunjun Gao, Kevin Chiew, Qinming He, Lu Chen
Pages: 985-988
doi>10.1145/2484028.2484120
Full text: PDFPDF

Commodity information such as prices and public reviews is always the concern of consumers. Helping them conveniently acquire these information as an instant reference is often of practical significance for their purchase activities. Nowadays, Web 2.0, ...
expand
Temporal variance of intents in multi-faceted event-driven information needs
Stewart Whiting, Ke Zhou, Joemon Jose, Mounia Lalmas
Pages: 989-992
doi>10.1145/2484028.2484169
Full text: PDFPDF

Time is often important for understanding user intent during search activity, especially for information needs related to event-driven topics. Diversity for multi-faceted information needs ensures that ranked documents optimally cover multiple facets ...
expand
Pursuing insights about healthcare utilization via geocoded search queries
Shuang-Hong Yang, Ryen W. White, Eric Horvitz
Pages: 993-996
doi>10.1145/2484028.2484147
Full text: PDFPDF

Mobile devices provide people with a conduit to the rich infor-mation resources of the Web. With consent, the devices can also provide streams of information about search activity and location that can be used in population studies and real-time assistance. ...
expand
SESSION: Short papers 2 -- retrieval models and ranking
Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures
Nima Asadi, Jimmy Lin
Pages: 997-1000
doi>10.1145/2484028.2484132
Full text: PDFPDF

This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore ...
expand
Estimating topical context by diverging from external resources
Romain Deveaud, Eric SanJuan, Patrice Bellot
Pages: 1001-1004
doi>10.1145/2484028.2484148
Full text: PDFPDF

Improving query understanding is crucial for providing the user with information that suits her needs. To this end, the retrieval system must be able to deal with several sources of knowledge from which it could infer a topical context. The use of external ...
expand
Finding knowledgeable groups in enterprise corpora
Shangsong Liang, Maarten de Rijke
Pages: 1005-1008
doi>10.1145/2484028.2484109
Full text: PDFPDF

The task of finding groups is a natural extension of search tasks aimed at retrieving individual entities. We introduce a group finding task: given a query topic, find knowledgeable groups that have expertise on that topic. We present four general strategies ...
expand
Neighbourhood preserving quantisation for LSH
Sean Moran, Victor Lavrenko, Miles Osborne
Pages: 1009-1012
doi>10.1145/2484028.2484162
Full text: PDFPDF

We introduce a scheme for optimally allocating multiple bits per hyperplane for Locality Sensitive Hashing (LSH). Existing approaches binarise LSH projections by thresholding at zero yielding a single bit per dimension. We demonstrate that this is a ...
expand
Shame to be sham: addressing content-based grey hat search engine optimization
Fiana Raiber, Kevyn Collins-Thompson, Oren Kurland
Pages: 1013-1016
doi>10.1145/2484028.2484135
Full text: PDFPDF

We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey ...
expand
IRWR: incremental random walk with restart
Weiren Yu, Xuemin Lin
Pages: 1017-1020
doi>10.1145/2484028.2484114
Full text: PDFPDF

Random Walk with Restart (RWR) has become an appealing measure of node proximities in emerging applications \eg recommender systems and automatic image captioning. In practice, a real graph is typically large, and is frequently updated with small changes. ...
expand
Bias-variance decomposition of ir evaluation
Peng Zhang, Dawei Song, Jun Wang, Yuexian Hou
Pages: 1021-1024
doi>10.1145/2484028.2484127
Full text: PDFPDF

It has been recognized that, when an information retrieval (IR) system achieves improvement in mean retrieval effectiveness (e.g. mean average precision (MAP)) over all the queries, the performance (e.g., average precision (AP)) of some individual queries ...
expand
An adaptive evidence weighting method for medical record search
Dongqing Zhu, Ben Carterette
Pages: 1025-1028
doi>10.1145/2484028.2484175
Full text: PDFPDF

In this paper, we present a medical record search system which is useful for identifying cohorts required in clinical studies. In particular, we propose a query-adaptive weighting method that can dynamically aggregate and score evidence in multiple medical ...
expand
Fresh BrowseRank
Maxim Zhukovskiy, Andrei Khropov, Gleb Gusev, Pavel Serdyukov
Pages: 1029-1032
doi>10.1145/2484028.2484186
Full text: PDFPDF

In the last years, a lot of attention was attracted by the problem of page authority computation based on user browsing behavior. However, the proposed methods have a number of limitations. In particular, they run on a single snapshot of a user browsing ...
expand
SESSION: Short papers 2 -- social media IR
Competition-based networks for expert finding
Çiğdem Aslay, Neil O'Hare, Luca Maria Aiello, Alejandro Jaimes
Pages: 1033-1036
doi>10.1145/2484028.2484183
Full text: PDFPDF

Finding experts in question answering platforms has important applications, such as question routing or identification of best answers. Addressing the problem of ranking users with respect to their expertise, we propose Competition-Based Expertise Networks ...
expand
A study on the accuracy of Flickr's geotag data
Claudia Hauff
Pages: 1037-1040
doi>10.1145/2484028.2484154
Full text: PDFPDF

Obtaining geographically tagged multimedia items from social Web platforms such as Flickr is beneficial for a variety of applications including the automatic creation of travelogues and personalized travel recommendations. In order to take advantage ...
expand
Finding impressive social content creators: searching for SNS illustrators using feedback on motifs and impressions
Yohei Seki, Kiyoto Miyajima
Pages: 1041-1044
doi>10.1145/2484028.2484133
Full text: PDFPDF

We propose a method for finding impressive creators in online social network sites (SNSs). Many users are actively engaged in publishing their own works, sharing visual content on sites such as YouTube or Flickr. In this paper, we focus on the Japanese ...
expand
Informational friend recommendation in social media
Shengxian Wan, Yanyan Lan, Jiafeng Guo, Chaosheng Fan, Xueqi Cheng
Pages: 1045-1048
doi>10.1145/2484028.2484179
Full text: PDFPDF

It is well recognized that users rely on social media (e.g. Twitter or Digg) to fulfill two common needs (i.e. social need and informational need) that is to keep in touch with their friends in the real world and to have access to information they are ...
expand
SESSION: Short papers 2 -- topic models
Using social annotations to enhance document representation for personalized search
Mohamed Reda BOUADJENEK, Hakim Hacid, Mokrane Bouzeghoub, Athena Vakali
Pages: 1049-1052
doi>10.1145/2484028.2484130
Full text: PDFPDF

In this paper, we present a contribution to IR modeling. We propose an approach that computes on the fly, a Personalized Social Document Representation (PSDR) of each document per user based on his social activities. The PSDRs are used to rank documents ...
expand
The bag-of-repeats representation of documents
Matthias Gallé
Pages: 1053-1056
doi>10.1145/2484028.2484142
Full text: PDFPDF

n-gram representations of documents may improve over a simple bag-of-word representation by relaxing the independence assumption of word and introducing context. However, this comes at a cost of adding features which are non-descriptive, and increasing ...
expand
An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval
Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones
Pages: 1057-1060
doi>10.1145/2484028.2484110
Full text: PDFPDF

Document expansion (DE) in information retrieval (IR) involves modifying each document in the collection by introducing additional terms into the document. It is particularly useful to improve retrieval of short and noisy documents where the additional ...
expand
SESSION: Short papers 2 -- users and interactive IR
Timeline generation with social attention
Xin Wayne Zhao, Yanwei Guo, Rui Yan, Yulan He, Xiaoming Li
Pages: 1061-1064
doi>10.1145/2484028.2484103
Full text: PDFPDF

Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of any given topic. It thus attracts much attention from research communities in recent years. Nevertheless, existing work on ...
expand
Explicit feedback in local search tasks
Dmitry Lagun, Avneesh Sud, Ryen W. White, Peter Bailey, Georg Buscher
Pages: 1065-1068
doi>10.1145/2484028.2484123
Full text: PDFPDF

Modern search engines make extensive use of people's contextual information to finesse result rankings. Using a searcher's location provides an especially strong signal for adjusting results for certain classes of queries where people may have clear ...
expand
Ranking explanatory sentences for opinion summarization
Hyun Duk Kim, Malu G. Castellanos, Meichun Hsu, ChengXiang Zhai, Umeshwar Dayal, Riddhiman Ghosh
Pages: 1069-1072
doi>10.1145/2484028.2484172
Full text: PDFPDF

We introduce a novel sentence ranking problem called explanatory sentence extraction (ESE) which aims to rank sentences in opinionated text based on their usefulness for helping users understand the detailed reasons of sentiments (i.e., "explanatoriness"). ...
expand
#trapped!: social media search system requirements for emergency management professionals
Stefan Raue, Leif Azzopardi, Chris W. Johnson
Pages: 1073-1076
doi>10.1145/2484028.2484184
Full text: PDFPDF

Social media provides a new and potentially rich source of information for emergency management services. However, extracting the relevant information from such streams poses a number of difficult challenges. In this short paper, we survey emergency ...
expand
DEMONSTRATION SESSION: Demonstrations 1 -- Users and interactive IR
ThemeStreams: visualizing the stream of themes discussed in politics
Ork de Rooij, Daan Odijk, Maarten de Rijke
Pages: 1077-1078
doi>10.1145/2484028.2484215
Full text: PDFPDF

The political landscape is fluid. Discussions are always ongoing and new "hot topics" continue to appear in the headlines. But what made people start talking about that topic? And who started it? Because of the speed at which discussions sometimes take ...
expand
BATC: a benchmark for aggregation techniques in crowdsourcing
Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Ngoc Tran Lam, Karl Aberer
Pages: 1079-1080
doi>10.1145/2484028.2484199
Full text: PDFPDF

As the volumes of AI problems involving human knowledge are likely to soar, crowdsourcing has become essential in a wide range of world-wide-web applications. One of the biggest challenges of crowdsourcing is aggregating the answers collected from crowd ...
expand
Spacious: an interactive mental search interface
Phong D. Vo, Hichem Sahbi
Pages: 1081-1082
doi>10.1145/2484028.2484203
Full text: PDFPDF

We introduce in this work a novel approach for semantic indexing and mental image search. Given semantic concepts defined by few training examples, our formulation is transductive and learns a mapping from an initial ambient space, related to low level ...
expand
DEMONSTRATION SESSION: Demonstrations 1 -- IR and structured data
Flex-BaseX: an XML engine with a flexible extension of Xquery full-text
Emanuele Panzeri, Gabriella Pasi
Pages: 1083-1084
doi>10.1145/2484028.2484216
Full text: PDFPDF

XML is the most used language for structuring data and documents, besides being the de-facto standard for data exchange. Keyword based search has been implemented by the XQuery Full-Text language extension, allowing document fragments to be retrieved ...
expand
ProductSeeker: entity-based product retrieval for e-commerce
Hongzhi Wang, Xiaodong Zhang, Jianzhong Li, Hong Gao
Pages: 1085-1086
doi>10.1145/2484028.2484205
Full text: PDFPDF

The retrieval results of online products information in e-commerce web sites are often difficult for users to use because of different descriptions for the same product. This paper proposes ProductSeeker, a product retrieval system organizing results ...
expand
DEMONSTRATION SESSION: Demonstrations 1 -- information extraction
Live nuggets extractor: a semi-automated system for text extraction and test collection creation
Matthew Ekstrand-Abueg, Virgil Pavlu, Javed A. Aslam
Pages: 1087-1088
doi>10.1145/2484028.2484211
Full text: PDFPDF

The Live Nugget Extractor system provides users with a method of efficiently and accurately collecting relevant information for any web query rather than providing a simple ranked lists of documents. The system utilizes an online learning procedure to ...
expand
X-ENS: semantic enrichment of web search results at real-time
Pavlos Fafalios, Yannis Tzitzikas
Pages: 1089-1090
doi>10.1145/2484028.2484200
Full text: PDFPDF

While more and more semantic data are published on the Web, an important question is how typical web users can access and exploit this body of knowledge. Although, existing interaction paradigms in semantic search hide the complexity behind an easy-to-use ...
expand
Accurate and robust text detection: a step-in for text retrieval in natural scene images
Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, Hong-Wei Hao
Pages: 1091-1092
doi>10.1145/2484028.2484197
Full text: PDFPDF

We propose and implement a robust text detection system, which is a prominent step-in for text retrieval in natural scene images or videos. Our system includes several key components: (1) A fast and effective pruning algorithm is designed to extract ...
expand
DEMONSTRATION SESSION: Demonstrations 1 -- filtering and recommending
A framework for specific term recommendation systems
Thomas Lüke, Philipp Schaer, Philipp Mayr
Pages: 1093-1094
doi>10.1145/2484028.2484207
Full text: PDFPDF

In this paper we present the IRSA framework that enables the automatic creation of search term suggestion or recommendation systems (TS). Such TS are used to operationalize interactive query expansion and help users in refining their information need ...
expand
TweetMogaz: a news portal of tweets
Walid Magdy
Pages: 1095-1096
doi>10.1145/2484028.2484212
Full text: PDFPDF

Twitter is currently one of the largest social hubs for users to spread and discuss news. For most of the top news stories happening, there are corresponding discussions on social media. In this demonstration TweetMogaz is presented, which is a platform ...
expand
DEMONSTRATION SESSION: Demonstrations 1 -- classification and clustering
InfoLand: information lay-of-land for session search
Jiyun Luo, Dongyi Guan, Hui Yang
Pages: 1097-1098
doi>10.1145/2484028.2484213
Full text: PDFPDF

Search result clustering (SRC) is a post-retrieval process that hierarchically organizes search results. The hierarchical structure offers overview for the search results and displays an "information lay-of-land" that intents to guide the users throughout ...
expand
A portable multilingual medical directory by automatic categorization of Wikipedia articles
Fernando Ruiz-Rico, María-Consuelo Rubio-Sánchez, David Tomás, Jose-Luis Vicedo
Pages: 1099-1100
doi>10.1145/2484028.2484217
Full text: PDFPDF

Wikipedia has become one of the most important sources of information available all over the world. However, the categorization of Wikipedia articles is not standardized and the searches are mainly performed on keywords rather than concepts. In this ...
expand
DEMONSTRATION SESSION: Demonstrations 2 -- users and interactive IR
A geolinguistic web application based on linked open data
Emanuele Di Buccio, Giorgio Maria Di Nunzio, Gianmaria Silvello
Pages: 1101-1102
doi>10.1145/2484028.2484219
Full text: PDFPDF

Digital Geolinguistic systems encourage collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. In this demo, we propose a Linked Open Data approach ...
expand
TopicVis: a GUI for topic-based feedback and navigation
Debasis Ganguly, Manisha Ganguly, Johannes Leveling, Gareth J.F. Jones
Pages: 1103-1104
doi>10.1145/2484028.2484202
Full text: PDFPDF

This paper describes a search system which includes topic model visualization to improve the user search experience. The system graphically renders the topics in a retrieved set of documents, enables a user to selectively refine search results and allows ...
expand
Information seeking in digital cultural heritage with PATHS
Mark M. Hall, Paul D. Clough, Samuel Fernando, Paula Goodale, Mark Stevenson, Eneko Agirre, Arantxa Otegi, Aitor Soroa, Kate Fernie, Jillian Griffiths, Runar Bergheim
Pages: 1105-1106
doi>10.1145/2484028.2484210
Full text: PDFPDF

Current Information Retrieval systems for digital cultural heritage support only the actual search aspect of the information seeking process. This demonstration presents the second PATHS system which provides the exploration, analysis, and sense-making ...
expand
DEMONSTRATION SESSION: Demonstrations 2 -- IR and structured data
Answering natural language queries over linked data graphs: a distributional semantics approach
André Freitas, Fabrício F. de Faria, Seán O'Riain, Edward Curry
Pages: 1107-1108
doi>10.1145/2484028.2484209
Full text: PDFPDF

This paper demonstrates Treo, a natural language query mechanism for Linked Data graphs. The approach uses a distributional semantic vector space model to semantically match user query terms with data, supporting vocabulary-independent (or ...
expand
Removing the mismatch headache in XML keyword search
Yong Zeng, Zhifeng Bao, Tok Wang Ling, Guoliang Li
Pages: 1109-1110
doi>10.1145/2484028.2484218
Full text: PDFPDF

In this demo, we study one category of query refinement problems in the context of XML keyword search, where what users search for do not exist in the data while useless results are returned by the search engine. It is a hidden but important problem. ...
expand
YaLi: a crowdsourcing plug-in for NERD
Yafang Wang, Lili Jiang, Johannes Hoffart, Gerhard Weikum
Pages: 1111-1112
doi>10.1145/2484028.2484206
Full text: PDFPDF

We demonstrate the YaLi browser plug-in which discovers named entities in Web pages and provides background knowledge about them. The plug-in is implemented with two purposes. From a user perspective, it enriches the browsing experience with entities, ...
expand
DEMONSTRATION SESSION: Demonstrations 2 -- information extraction
SearchResultFinder: federated search made easy
Dolf Trieschnigg, Kien Tjin-Kam-Jet, Djoerd Hiemstra
Pages: 1113-1114
doi>10.1145/2484028.2484198
Full text: PDFPDF

Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, ...
expand
DEMONSTRATION SESSION: Demonstrations 2 -- filtering and recommending
Online matching of web content to closed captions in IntoNow
Carlos Castillo, Gianmarco De Francisci Morales, Ajay Shekhawat
Pages: 1115-1116
doi>10.1145/2484028.2484204
Full text: PDFPDF

IntoNow is a mobile application that provides a second-screen experience to television viewers. IntoNow uses the microphone of the companion device to sample the audio coming from the TV set, and compares it against a database of TV shows in order to ...
expand
Match the news: a firefox extension for real-time news recommendation
Margarita Karkali, Dimitris Pontikis, Michalis Vazirgiannis
Pages: 1117-1118
doi>10.1145/2484028.2484208
Full text: PDFPDF

We present Match the News, a browser extension for real time news recommendation. Our extension works on the client side to recommend in real time recently published articles that are relevant to the web page the user is currently visiting. Match ...
expand
DEMONSTRATION SESSION: Demonstrations 2 -- classification and clustering
Demonstration of citation pattern analysis for plagiarism detection
Bela Gipp, Norman Meuschke, Corinna Breitinger, Mario Lipinski, Andreas Nürnberger
Pages: 1119-1120
doi>10.1145/2484028.2484214
Full text: PDFPDF
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms
Fernando Ruiz-Rico, David Tomás, Jose-Luis Vicedo, María-Consuelo Rubio-Sánchez
Pages: 1121-1122
doi>10.1145/2484028.2484201
Full text: PDFPDF

This paper presents an application for medicinal plants prescription based on text classification techniques. The system receives as an input a free text describing the symptoms of a user, and retrieves a ranked list of medicinal plants related to those ...
expand
TUTORIAL SESSION: Tutorials
Searching in the city of knowledge: challenges and recent developments
Veli Bicer, Vanessa Lopez
Pages: 1123-1123
doi>10.1145/2484028.2484195
Full text: PDFPDF

Today plenty of data is emerging from various city systems. Beyond the classical Web resources, large amounts of data are retrieved from sensors, devices, social networks, governmental applications, or service networks. In such a diversity of information, ...
expand
Scalability and efficiency challenges in commercial web search engines
B. Barla Cambazoglu, Ricardo Baeza-Yates
Pages: 1124-1124
doi>10.1145/2484028.2484189
Full text: PDFPDF

Commercial web search engines rely on very large compute infrastructures to be able to cope with the continuous growth of the Web and user bases. Achieving scalability and efficiency in such large-scale search engines requires making careful architectural ...
expand
Music similarity and retrieval
Peter Knees, Markus Schedl
Pages: 1125-1125
doi>10.1145/2484028.2484193
Full text: PDFPDF

This tutorial serves as an introductory course to the field of and state-of-the-art in music information retrieval (MIR) and in particular to music similarity estimation which is an essential component of music retrieval. Apart from explaining approaches ...
expand
The cluster hypothesis in information retrieval
Oren Kurland
Pages: 1126-1126
doi>10.1145/2484028.2484192
Full text: PDFPDF
Entity linking and retrieval
Edgar Meij, Krisztian Balog, Daan Odijk
Pages: 1127-1127
doi>10.1145/2484028.2484188
Full text: PDFPDF

This full-day tutorial presents a comprehensive introduction to entity linking and retrieval. Part I provides a detailed overview of entity linking: identifying and disambiguating entity occurrences in unstructured text. Part II focuses on entity retrieval, ...
expand
Kernel-based learning to rank with syntactic and semantic structures
Alessandro Moschitti
Pages: 1128-1128
doi>10.1145/2484028.2484196
Full text: PDFPDF

Kernel Methods (KMs) are powerful machine learning techniques that can alleviate the data representation problem as they substitute scalar product between feature vectors with similarity functions (kernels) directly defined between data instances, e.g., ...
expand
Designing search usability
Tony Russell-Rose
Pages: 1129-1129
doi>10.1145/2484028.2484191
Full text: PDFPDF

Search is not just a box and ten blue links. Search is a journey: an exploration where what we encounter along the way changes what we seek. But in order to guide people along this journey, we must understand both the art and science of search experience ...
expand
Diversity and novelty in information retrieval
Rodrygo L.T. Santos, Pablo Castells, Ismail Sengor Altingovde, Fazli Can
Pages: 1130-1130
doi>10.1145/2484028.2484187
Full text: PDFPDF

This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains, namely, in the context of search engines, recommender systems, and data streams.
expand
Multimedia recommendation: technology and techniques
Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui
Pages: 1131-1131
doi>10.1145/2484028.2484194
Full text: PDFPDF

In recent years, we have witnessed a rapid growth in the availability of digital multimedia on various application platforms and domains. Consequently, the problem of information overload has become more and more serious. In order to tackle the challenge, ...
expand
Building test collections: an interactive tutorial for students and others without their own evaluation conference series
Ian M. Soboroff
Pages: 1132-1132
doi>10.1145/2484028.2484190
Full text: PDFPDF

While existing test collections and evaluation conference efforts may sufficiently support one's research, one can easily find oneself wanting to solve problems no one else is solving yet. But how can research in IR be done (or be published!) without ...
expand
WORKSHOP SESSION: Workshops
Workshop on benchmarking adaptive retrieval and recommender systems: BARS 2013
Pablo Castells, Frank Hopfgartner, Alan Said, Mounia Lalmas
Pages: 1133-1133
doi>10.1145/2484028.2484224
Full text: PDFPDF

Evaluating adaptive and personalized information retrieval tech-niques is known to be a difficult endeavor. The rapid evolution of novel technologies in this scope raises additional challenges that further stress the need for new evaluation approaches ...
expand
SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation
Charles L.A. Clarke, Luanne Freund, Mark D. Smucker, Emine Yilmaz
Pages: 1134-1134
doi>10.1145/2484028.2484222
Full text: PDFPDF

The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval ...
expand
Internet advertising: theory and practice
Bin Gao, Jun Yan, Dou Shen, Tie-Yan Liu
Pages: 1135-1135
doi>10.1145/2484028.2484221
Full text: PDFPDF

Internet advertising, a form of advertising that utilizes the Internet to deliver marketing messages and attract customers, has seen exponential growth since its inception around twenty years ago; it has been pivotal to the success of the World Wide ...
expand
Exploration, navigation and retrieval of information in cultural heritage: ENRICH 2013
Séamus Lawless, Maristella Agosti, Paul Clough, Owen Conlan
Pages: 1136-1136
doi>10.1145/2484028.2491801
Full text: PDFPDF

The Exploration, Navigation and Retrieval of Information in Cultural Heritage Workshop (ENRICH 2013) offers a forum to 1) discuss the challenges and opportunities in Information Retrieval research in the area of Cultural Heritage; 2) encourage collaboration ...
expand
SIGIR 2013 workshop on time aware information access (#TAIA2013)
Fernando Diaz, Susan Dumais, Miles Efron, Kira Radinsky, Maarten de Rijke, Milad Shokouhi
Pages: 1137-1137
doi>10.1145/2484028.2491802
Full text: PDFPDF

Web content increasingly reflects the current state of the physical and social world, manifested both in traditional news media sources along with user-generated publishing sites such as Twitter, Foursquare, and Facebook. At the same time, web searching ...
expand
Workshop on health search and discovery: helping users and advancing medicine
Ryen W. White, Elad Yom-Tov, Eric Horvitz, Eugene Agichtein, William Hersh
Pages: 1138-1138
doi>10.1145/2484028.2484220
Full text: PDFPDF

This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medi-cal domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements ...
expand
EuroHCIR2013: the 3rd European workshop on human-computer interaction and information retrieval
Max L. Wilson, Birger Larsen, Preben Hansen, Kristian Norling, Tony Russell-Rose
Pages: 1139-1139
doi>10.1145/2484028.2484223
Full text: PDFPDF

A proposal summary for the EuroHCIR workshop at SIGIR2013.
expand
SESSION: Doctoral consortium
Beyond relevance: on novelty and diversity in tag recommendation
Fabiano Belém
Pages: 1140-1140
doi>10.1145/2484028.2484229
Full text: PDFPDF

We propose to explicitly exploit issues related to novelty and diversity in tag recommendation tasks, an unexplored research avenue (only relevance issues have been investigated so far), in order to improve user experience and satisfaction. We propose ...
expand
Group-support for task-based information searching: a knowledge-based approach
Thilo Boehm
Pages: 1141-1141
doi>10.1145/2484028.2484235
Full text: PDFPDF
Diversified relevance feedback
Matt Crane
Pages: 1142-1142
doi>10.1145/2484028.2484227
Full text: PDFPDF

The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result ...
expand
Segmentation strategies for passage retrieval in audio-visual documents
Petra Galuščáková
Pages: 1143-1143
doi>10.1145/2484028.2484237
Full text: PDFPDF

The importance of Information Retrieval (IR) in audio-visual recordings has been increasing with steeply growing numbers of audio-visual documents available on-line. Compared to traditional IR methods, this task requires specific techniques, such as ...
expand
Indexing and querying overlapping structures
Faegheh Hasibi
Pages: 1144-1144
doi>10.1145/2484028.2484234
Full text: PDFPDF

Structural information retrieval is mostly based on hierarchy. However, in real life information is not purely hierarchical and structural elements may overlap each other. The most common example is a document with two distinct structural views, where ...
expand
A query and patient understanding framework for medical records search
Nut Limsopatham
Pages: 1145-1145
doi>10.1145/2484028.2484228
Full text: PDFPDF

Electronic medical records (EMRs) are being increasingly used worldwide to facilitate improved healthcare services [2,3]. They describe the clinical decision process relating to a patient, detailing the observed symptoms, the conducted diagnostic tests, ...
expand
Semantic models for answer re-ranking in question answering
Piero Molino
Pages: 1146-1146
doi>10.1145/2484028.2484233
Full text: PDFPDF

The task of Question Answering (QA) is to find correct answers to users' questions expressed in natural language. In the last few years non-factoid QA received more attention. It focuses on causation, manner and reason questions, where the expected answer ...
expand
Task differentiation for personal search evaluation
Seyedeh Sargol Sadeghi
Pages: 1147-1147
doi>10.1145/2484028.2484236
Full text: PDFPDF
The role of current working context in professional search
Maya Sappelli
Pages: 1148-1148
doi>10.1145/2484028.2484231
Full text: PDFPDF

Today's working world of knowledge workers is changing rapidly. The available information that they need to process is ever growing. In addition, the characteristics of their work are changing as people can and do their work from home. This has resulted ...
expand
How far will you go?: characterizing and predicting online search stopping behavior using information scent and need for cognition
Wan-Ching Wu
Pages: 1149-1149
doi>10.1145/2484028.2484232
Full text: PDFPDF
Effective approaches to retrieving and using expertise in social media
Reyyan Yeniterzi
Pages: 1150-1150
doi>10.1145/2484028.2484230
Full text: PDFPDF

Expert retrieval has been widely studied especially after the introduction of Expert Finding task in the TREC's Enterprise Track in 2005 [3]. This track provided two different test collections crawled from two organizations' public-facing websites and ...
expand

Powered by The ACM Guide to Computing Literature


Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Table of Contents
SESSION: Keynote address
Salton award lecture: information retrieval as engineering science
Norbert Fuhr
Pages: 1-2
doi>10.1145/2348283.2348285
Full text: PDFPDF
Retrieving information from the book of humanity: the personalized medicine data tsunami crashes on the beach of jeopardy
Daniel R. Masys
Pages: 3-4
doi>10.1145/2348283.2348286
Full text: PDFPDF

From a mute but eloquent alphabet of 4 characters emerges a complex biological 'literature' whose highest expression is human existence. The rapidly advancing technologies of 'nextgen sequencing' will soon make it possible to inexpensively acquire and ...
expand
SESSION: Query suggestion
Adaptation of the concept hierarchy model with search logs for query recommendation on intranets
Ibrahim Adepoju Adeyanju, Dawei Song, M-Dyaa Albakour, Udo Kruschwitz, Anne De Roeck, Maria Fasli
Pages: 5-14
doi>10.1145/2348283.2348288
Full text: PDFPDF

A concept hierarchy created from a document collection can be used for query recommendation on Intranets by ranking terms according to the strength of their links to the query within the hierarchy. A major limitation is that this model produces the same ...
expand
Adaptive query suggestion for difficult queries
Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie, Ji-Rong Wen
Pages: 15-24
doi>10.1145/2348283.2348289
Full text: PDFPDF

Query suggestion is a useful tool to help users formulate better queries. Although this has been found highly useful globally, its effect on different queries may vary. In this paper, we examine the impact of query suggestion on queries of different ...
expand
Learning to suggest: a machine learning framework for ranking query suggestions
Umut Ozertem, Olivier Chapelle, Pinar Donmez, Emre Velipasaoglu
Pages: 25-34
doi>10.1145/2348283.2348290
Full text: PDFPDF

We consider the task of suggesting related queries to users after they issue their initial query to a web search engine. We propose a machine learning approach to learn the probability that a user may find a follow-up query both useful and relevant, ...
expand
SESSION: Multimedia 1
Privacy-aware image classification and search
Sergej Zerr, Stefan Siersdorfer, Jonathon Hare, Elena Demidova
Pages: 35-44
doi>10.1145/2348283.2348292
Full text: PDFPDF

Modern content sharing environments such as Flickr or YouTube contain a large amount of private resources such as photos showing weddings, family holidays, and private parties. These resources can be of a highly sensitive nature, disclosing many details ...
expand
Manhattan hashing for large-scale image retrieval
Weihao Kong, Wu-Jun Li, Minyi Guo
Pages: 45-54
doi>10.1145/2348283.2348293
Full text: PDFPDF

Hashing is used to learn binary-code representation for data with expectation of preserving the neighborhood structure in the original feature space. Due to its fast query speed and reduced storage cost, hashing has been widely used for efficient nearest ...
expand
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
Hao Xia, Pengcheng Wu, Steven C.H. Hoi, Rong Jin
Pages: 55-64
doi>10.1145/2348283.2348294
Full text: PDFPDF

Similarity search is a key challenge for multimedia retrieval applications where data are usually represented in high-dimensional space. Among various algorithms proposed for similarity search in high-dimensional space, Locality-Sensitive Hashing (LSH) ...
expand
SESSION: Diversity 1
Diversity by proportionality: an election-based approach to search result diversification
Van Dang, W. Bruce Croft
Pages: 65-74
doi>10.1145/2348283.2348296
Full text: PDFPDF

This paper presents a different perspective on diversity in search results: diversity by proportionality. We consider a result list most diverse, with respect to some set of topics related to the query, when the number of documents it provides on each ...
expand
Explicit relevance models in intent-oriented information retrieval diversification
Saúl Vargas, Pablo Castells, David Vallet
Pages: 75-84
doi>10.1145/2348283.2348297
Full text: PDFPDF

The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components in particular redundancy assessment are expressed in terms of the ...
expand
AspecTiles: tile-based visualization of diversified web search results
Mayu Iwata, Tetsuya Sakai, Takehiro Yamamoto, Yu Chen, Yi Liu, Ji-Rong Wen, Shojiro Nishio
Pages: 85-94
doi>10.1145/2348283.2348298
Full text: PDFPDF

A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension ...
expand
SESSION: Evaluation 1
Time-based calibration of effectiveness measures
Mark D. Smucker, Charles L.A. Clarke
Pages: 95-104
doi>10.1145/2348283.2348300
Full text: PDFPDF

Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by ...
expand
Time drives interaction: simulating sessions in diverse searching environments
Feza Baskaya, Heikki Keskustalo, Kalervo Järvelin
Pages: 105-114
doi>10.1145/2348283.2348301
Full text: PDFPDF

Real life information retrieval takes place in sessions, where users search by iterating between various cognitive, perceptual and motor subtasks through an interactive interface. The sessions may follow diverse strategies, which, together with the interface ...
expand
Evaluating aggregated search pages
Ke Zhou, Ronan Cummins, Mounia Lalmas, Joemon M. Jose
Pages: 115-124
doi>10.1145/2348283.2348302
Full text: PDFPDF

Aggregating search results from a variety of heterogeneous sources or verticals such as news, image and video into a single interface is a popular paradigm in web search. Although various approaches exist for selecting relevant verticals or optimising ...
expand
SESSION: Structured data
Combining inverted indices and structured search for ad-hoc object retrieval
Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux
Pages: 125-134
doi>10.1145/2348283.2348304
Full text: PDFPDF

Retrieving semi-structured entities to answer keyword queries is an increasingly important feature of many modern Web applications. The fast-growing Linked Open Data (LOD) movement makes it possible to crawl and index very large amounts of structured ...
expand
Retrieving similar discussion forum threads: a structure based approach
Amit Singh, Deepak P, Dinesh Raghu
Pages: 135-144
doi>10.1145/2348283.2348305
Full text: PDFPDF

Online forums are becoming a popular way of finding useful information on the web. Search over forums for existing discussion threads so far is limited to keyword-based search due to the minimal effort required on part of the users. However, it is often ...
expand
Summarizing highly structured documents for effective search interaction
Lanbo Zhang, Yi Zhang, Yunfei Chen
Pages: 145-154
doi>10.1145/2348283.2348306
Full text: PDFPDF

As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document summarization, especially in the context ...
expand
SESSION: Recommender systems 1
TFMAP: optimizing MAP for top-n context-aware recommendation
Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, Nuria Oliver
Pages: 155-164
doi>10.1145/2348283.2348308
Full text: PDFPDF

In this paper, we tackle the problem of top-N context-aware recommendation for implicit feedback scenarios. We frame this challenge as a ranking problem in collaborative filtering (CF). Much of the past work on CF has not focused on evaluation metrics ...
expand
Increasing temporal diversity with purchase intervals
Gang Zhao, Mong Li Lee, Wynne Hsu, Wei Chen
Pages: 165-174
doi>10.1145/2348283.2348309
Full text: PDFPDF

The development of Web 2.0 technology has led to huge economic benefits and challenges for both e-commerce websites and online shoppers. One core technology to increase sales and consumers' satisfaction is the use of recommender systems. Existing product ...
expand
Adaptive diversification of recommendation results via latent factor portfolio
Yue Shi, Xiaoxue Zhao, Jun Wang, Martha Larson, Alan Hanjalic
Pages: 175-184
doi>10.1145/2348283.2348310
Full text: PDFPDF

This paper studies result diversification in collaborative filtering. We argue that the diversification level in a recommendation list should be adapted to the target users' individual situations and needs. Different users may have different ranges of ...
expand
SESSION: Users 1: personalization and user modeling
Modeling the impact of short- and long-term behavior on search personalization
Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, Xiaoyuan Cui
Pages: 185-194
doi>10.1145/2348283.2348312
Full text: PDFPDF

User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual's history of queries and clicked ...
expand
Improving searcher models using mouse cursor activity
Jeff Huang, Ryen W. White, Georg Buscher, Kuansan Wang
Pages: 195-204
doi>10.1145/2348283.2348313
Full text: PDFPDF

Web search components such as ranking and query suggestions analyze the user data provided in query and click logs. While this data is easy to collect and provides information about user behavior, it omits user interactions with the search engine that ...
expand
Personalization of search results using interaction behaviors in search sessions
Chang Liu, Nicholas J. Belkin, Michael J. Cole
Pages: 205-214
doi>10.1145/2348283.2348314
Full text: PDFPDF

Personalization of search results offers the potential for significant improvement in information retrieval performance. User interactions with the system and documents during information-seeking sessions provide a wealth of information about user preferences ...
expand
User evaluation of query quality
Wan-Ching Wu, Diane Kelly, Kun Huang
Pages: 215-224
doi>10.1145/2348283.2348315
Full text: PDFPDF

Although a great deal of research has been conducted about automatic techniques for determining query quality, there have been relatively few studies about how people judge query quality. This study investigated this topic through a laboratory experiment ...
expand
SESSION: Architectures 1
Efficient in-memory top-k document retrieval
J. Shane Culpepper, Matthias Petri, Falk Scholer
Pages: 225-234
doi>10.1145/2348283.2348317
Full text: PDFPDF

For over forty years the dominant data structure for ranked document retrieval has been the inverted index. Inverted indexes are effective for a variety of document retrieval tasks, and particularly efficient for large data collection scenarios that ...
expand
Index maintenance for time-travel text search
Avishek Anand, Srikanta Bedathur, Klaus Berberich, Ralf Schenkel
Pages: 235-244
doi>10.1145/2348283.2348318
Full text: PDFPDF

Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different ...
expand
Optimizing positional index structures for versioned document collections
JInru He, Torsten Suel
Pages: 245-254
doi>10.1145/2348283.2348319
Full text: PDFPDF

Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and documents maintained in revision control systems. Versioned document collections ...
expand
To index or not to index: time-space trade-offs in search engines with positional ranking functions
Diego Arroyuelo, Senén González, Mauricio Marin, Mauricio Oyarzún, Torsten Suel
Pages: 255-264
doi>10.1145/2348283.2348320
Full text: PDFPDF

Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically ...
expand
SESSION: Search log analysis
Studies of the onset and persistence of medical concerns in search logs
Ryen W. White, Eric Horvitz
Pages: 265-274
doi>10.1145/2348283.2348322
Full text: PDFPDF

The Web provides a wealth of information about medical symptoms and disorders. Although this content is often valuable to consumers, studies have found that interaction with Web content may heighten anxiety and stimulate healthcare utilization. We present ...
expand
A semi-supervised approach to modeling web search satisfaction
Ahmed Hassan
Pages: 275-284
doi>10.1145/2348283.2348323
Full text: PDFPDF

Web search is an interactive process that involves actions from Web search users and responses from the search engine. Many research efforts have been made to address the problem of understanding search behavior in general. Some of this work focused ...
expand
Social annotations: utility and prediction modeling
Patrick Pantel, Michael Gamon, Omar Alonso, Kevin Haas
Pages: 285-294
doi>10.1145/2348283.2348324
Full text: PDFPDF

Social features are increasingly integrated within the search results page of the main commercial search engines. There is, however, little understanding of the utility of social features in traditional search. In this paper, we study utility in the ...
expand
An exploration of ranking heuristics in mobile local search
Yuanhua Lv, Dimitrios Lymberopoulos, Qiang Wu
Pages: 295-304
doi>10.1145/2348283.2348325
Full text: PDFPDF

Users increasingly rely on their mobile devices to search local entities, typically businesses, while on the go. Even though recent work has recognized that the ranking signals in mobile local search (e.g., distance and customer rating score of a business) ...
expand
SESSION: User intent
Mining query subtopics from search log data
Yunhua Hu, Yanan Qian, Hang Li, Daxin Jiang, Jian Pei, Qinghua Zheng
Pages: 305-314
doi>10.1145/2348283.2348327
Full text: PDFPDF

Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...
expand
Search, interrupted: understanding and predicting search task continuation
Eugene Agichtein, Ryen W. White, Susan T. Dumais, Paul N. Bennet
Pages: 315-324
doi>10.1145/2348283.2348328
Full text: PDFPDF

Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the "state" ...
expand
Multi-aspect query summarization by composite query
Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen
Pages: 325-334
doi>10.1145/2348283.2348329
Full text: PDFPDF

Conventional search engines usually return a ranked list of web pages in response to a query. Users have to visit several pages to locate the relevant parts. A promising future search scenario should involve: (1) understanding user intents; (2) providing ...
expand
Language intent models for inferring user browsing behavior
Manos Tsagkias, Roi Blanco
Pages: 335-344
doi>10.1145/2348283.2348330
Full text: PDFPDF

Modeling user browsing behavior is an active research area with tangible real-world applications, e.g., organizations can adapt their online presence to their visitors browsing behavior with positive effects in user engagement, and revenue. We concentrate ...
expand
SESSION: Efficiency
Efficient query recommendations in the long tail via center-piece subgraphs
Francesco Bonchi, Raffaele Perego, Fabrizio Silvestri, Hossein Vahabi, Rossano Venturini
Pages: 345-354
doi>10.1145/2348283.2348332
Full text: PDFPDF

We present a recommendation method based on the well-known concept of center-piece subgraph, that allows for the time/space efficient generation of suggestions also for rare, i.e., long-tail queries. Our method is scalable with respect to both the size ...
expand
Supporting efficient top-k queries in type-ahead search
Guoliang Li, Jiannan Wang, Chen Li, Jianhua Feng
Pages: 355-364
doi>10.1145/2348283.2348333
Full text: PDFPDF

Type-ahead search can on-the-fly find answers as a user types in a keyword query. A main challenge in this search paradigm is the high-efficiency requirement that queries must be answered within milliseconds. In this paper we study how to answer top-k ...
expand
SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks
Weiren Yu, Xuemin Lin, Wenjie Zhang, Ying Zhang, Jiajin Le
Pages: 365-374
doi>10.1145/2348283.2348334
Full text: PDFPDF

SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion ...
expand
Group matrix factorization for scalable topic modeling
Quan Wang, Zheng Cao, Jun Xu, Hang Li
Pages: 375-384
doi>10.1145/2348283.2348335
Full text: PDFPDF

Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, search relevance ranking, document classification, and so on. One of the major challenges in topic modeling is to deal with large datasets and large numbers ...
expand
SESSION: Spam and abuse
Detecting quilted web pages at scale
Marc Najork
Pages: 385-394
doi>10.1145/2348283.2348337
Full text: PDFPDF

Web-based advertising and electronic commerce, combined with the key role of search engines in driving visitors to ad-monetized and e-commerce web sites, has given rise to the phenomenon of web spam: web pages that are of little value to visitors, but ...
expand
Fighting against web spam: a novel propagation method based on click-through data
Chao Wei, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun Ru, Kuo Zhang
Pages: 395-404
doi>10.1145/2348283.2348338
Full text: PDFPDF

Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam approaches ...
expand
Learning hash codes for efficient content reuse detection
Qi Zhang, Yan Wu, Zhuoye Ding, Xuanjing Huang
Pages: 405-414
doi>10.1145/2348283.2348339
Full text: PDFPDF

Content reuse is extremely common in user generated mediums. Reuse detection serves as be the basis for many applications. However, along with the explosion of Internet and continuously growing uses of user generated mediums, the task becomes more critical ...
expand
SESSION: Users 2: exploratory search
Explanatory semantic relatedness and explicit spatialization for exploratory search
Brent Hecht, Samuel H. Carton, Mahmood Quaderi, Johannes Schöning, Martin Raubal, Darren Gergle, Doug Downey
Pages: 415-424
doi>10.1145/2348283.2348341
Full text: PDFPDF

Exploratory search, in which a user investigates complex concepts, is cumbersome with today's search engines. We present a new exploratory search approach that generates interactive visualizations of query concepts using thematic cartography (e.g. choropleth ...
expand
A subjunctive exploratory search interface to support media studies researchers
Marc Bron, Jasmijn van Gorp, Frank Nack, Maarten de Rijke, Andrei Vishneuski, Sonja de Leeuw
Pages: 425-434
doi>10.1145/2348283.2348342
Full text: PDFPDF

Media studies concerns the study of production, content, and/or reception of various types of media. Today's continuous production and storage of media is changing the way media studies researchers work and requires the development of new search models ...
expand
Task complexity, vertical display and user interaction in aggregated search
Jaime Arguello, Wan-Ching Wu, Diane Kelly, Ashlee Edwards
Pages: 435-444
doi>10.1145/2348283.2348343
Full text: PDFPDF

Aggregated search is the task of blending results from specialized search services or verticals into the Web search results. While many studies have focused on aggregated search techniques, few studies have tried to better understand how users interact ...
expand
SESSION: Multimedia 2
Image ranking based on user browsing behavior
Michele Trevisiol, Luca Chiarandini, Luca Maria Aiello, Alejandro Jaimes
Pages: 445-454
doi>10.1145/2348283.2348345
Full text: PDFPDF

Ranking of images is difficult because many factors determine their importance (e.g., popularity, quality, entertainment value, context, etc.). In social media platforms, ranking also depends on social interactions and on the visibility of the images ...
expand
Modeling concept dynamics for large scale music search
Jialie Shen, HweeHwa Pang, Meng Wang, Shuicheng Yan
Pages: 455-464
doi>10.1145/2348283.2348346
Full text: PDFPDF

Continuing advances in data storage and communication technologies have led to an explosive growth in digital music collections. To cope with their increasing scale, we need effective Music Information Retrieval (MIR) capabilities like tagging, concept ...
expand
Finding translations in scanned book collections
Ismet Zeki Yalniz, R. Manmatha
Pages: 465-474
doi>10.1145/2348283.2348347
Full text: PDFPDF

This paper describes an approach for identifying translations of books in large scanned book collections with OCR errors. The method is based on the idea that although individual sentences do not necessarily preserve the word order when translated, a ...
expand
SESSION: Recommender systems 2
Predicting the ratings of multimedia items for making personalized recommendations
Rani Qumsiyeh, Yiu-Kai Ng
Pages: 475-484
doi>10.1145/2348283.2348349
Full text: PDFPDF

Existing multimedia recommenders suggest a specific type of multimedia items rather than items of different types personalized for a user based on his/her preference. Assume that a user is interested in a particular family movie, it is appealing if a ...
expand
Personalized click shaping through lagrangian duality for online recommendation
Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, Xuanhui Wang
Pages: 485-494
doi>10.1145/2348283.2348350
Full text: PDFPDF

Online content recommendation aims to identify trendy articles in a continuously changing dynamic content pool. Most of existing works rely on online user feedback, notably clicks, as the objective and maximize it by showing articles with highest click-through ...
expand
What reviews are satisfactory: novel features for automatic helpfulness voting
Yu Hong, Jun Lu, Jianmin Yao, Qiaoming Zhu, Guodong Zhou
Pages: 495-504
doi>10.1145/2348283.2348351
Full text: PDFPDF

This paper focuses on exploring the features of product reviews that satisfy users, by which to improve the automatic helpfulness voting for the reviews on commercial websites. Compared to the previous work, which single-mindedly adopts the textual features ...
expand
SESSION: Query expansion and reformulation
Automatic refinement of patent queries using concept importance predictors
Parvaz Mahdabi, Linda Andersson, Mostafa Keikha, Fabio Crestani
Pages: 505-514
doi>10.1145/2348283.2348353
Full text: PDFPDF

Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not represent a focused information need. One way to make the queries more focused is to ...
expand
Automatic term mismatch diagnosis for selective query expansion
Le Zhao, Jamie Callan
Pages: 515-524
doi>10.1145/2348283.2348354
Full text: PDFPDF

People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set of relevant documents, but would largely increase the chance of search failure ...
expand
Generating reformulation trees for complex queries
Xiaobing Xue, W. Bruce Croft
Pages: 525-534
doi>10.1145/2348283.2348355
Full text: PDFPDF

Search queries have evolved beyond keyword queries. Many complex queries such as verbose queries, natural language question queries and document-based queries are widely used in a variety of applications. Processing these complex queries usually requires ...
expand
Proximity-based rocchio's model for pseudo relevance
Jun Miao, Jimmy Xiangji Huang, Zheng Ye
Pages: 535-544
doi>10.1145/2348283.2348356
Full text: PDFPDF

Rocchio's relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The selection of expansion terms in this method, however, does not take into account the relationship ...
expand
SESSION: Social media 1
Modeling user posting behavior on social media
Zhiheng Xu, Yang Zhang, Yao Wu, Qing Yang
Pages: 545-554
doi>10.1145/2348283.2348358
Full text: PDFPDF

User generated content is the basic element of social media websites. Relatively few studies have systematically analyzed the motivation to create and share content, especially from the perspective of a common user. In this paper, we perform a comprehensive ...
expand
Friend or frenemy?: predicting signed ties in social networks
Shuang-Hong Yang, Alexander J. Smola, Bo Long, Hongyuan Zha, Yi Chang
Pages: 555-564
doi>10.1145/2348283.2348359
Full text: PDFPDF

We study the problem of labeling the edges of a social network graph (e.g., acquaintance connections in Facebook) as either positive (i.e., trust, true friendship) or negative (i.e., distrust, possible frenemy) relations. Such signed relations provide ...
expand
Social-network analysis using topic models
Youngchul Cha, Junghoo Cho
Pages: 565-574
doi>10.1145/2348283.2348360
Full text: PDFPDF

In this paper, we discuss how we can extend probabilistic topic models to analyze the relationship graph of popular social-network data, so that we can group or label the edges and nodes in the graph based on their topic similarity. In particular, we ...
expand
Cognos: crowdsourcing search for topic experts in microblogs
Saptarshi Ghosh, Naveen Sharma, Fabricio Benevenuto, Niloy Ganguly, Krishna Gummadi
Pages: 575-590
doi>10.1145/2348283.2348361
Full text: PDFPDF

Finding topic experts on microblogging sites with millions of users, such as Twitter, is a hard and challenging problem. In this paper, we propose and investigate a new methodology for discovering topic experts in the popular Twitter social network. ...
expand
SESSION: Query completion and correction
Automatic suggestion of query-rewrite rules for enterprise search
Zhuowei Bao, Benny Kimelfeld, Yunyao Li
Pages: 591-600
doi>10.1145/2348283.2348363
Full text: PDFPDF

Enterprise search is challenging for several reasons, notably the dynamic terminology and jargon that are specific to the enterprise domain. This challenge is partly addressed by having domain experts maintaining the enterprise search engine and adapting ...
expand
Time-sensitive query auto-completion
Milad Shokouhi, Kira Radinsky
Pages: 601-610
doi>10.1145/2348283.2348364
Full text: PDFPDF

Query auto-completion (QAC) is a common feature in modern search engines. High quality QAC candidates enhance search experience by saving users time that otherwise would be spent on typing each character or word sequentially. Current QAC methods rank ...
expand
A generalized hidden Markov model with discriminative training for query spelling correction
Yanen Li, Huizhong Duan, ChengXiang Zhai
Pages: 611-620
doi>10.1145/2348283.2348365
Full text: PDFPDF

Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, ...
expand
SESSION: Architectures 2
Learning to predict response times for online query scheduling
Craig Macdonald, Nicola Tonellotto, Iadh Ounis
Pages: 621-630
doi>10.1145/2348283.2348367
Full text: PDFPDF

Dynamic pruning strategies permit efficient retrieval by not fully scoring all postings of the documents matching a query -- without degrading the retrieval effectiveness of the top-ranked results. However, the amount of pruning achievable for a query ...
expand
Prefetching query results and its impact on search engines
Simon Jonassen, B. Barla Cambazoglu, Fabrizio Silvestri
Pages: 631-640
doi>10.1145/2348283.2348368
Full text: PDFPDF

We investigate the impact of query result prefetching on the efficiency and effectiveness of web search engines. We propose offline and online strategies for selecting and ordering queries whose results are to be prefetched. The offline strategies rely ...
expand
Online result cache invalidation for real-time web search
Xiao Bai, Flavio P. Junqueira
Pages: 641-650
doi>10.1145/2348283.2348369
Full text: PDFPDF

Caches of results are critical components of modern Web search engines, since they enable lower response time to frequent queries and reduce the load to the search engine backend. Results in long-lived cache entries may become stale, however, as search ...
expand
SESSION: Recommender systems 3
Learning to rank social update streams
Liangjie Hong, Ron Bekkerman, Joseph Adler, Brian D. Davison
Pages: 651-660
doi>10.1145/2348283.2348371
Full text: PDFPDF

As online social media further integrates deeper into our lives, we spend more time consuming social update streams that come from our online connections. Although social update streams provide a tremendous opportunity for us to access information on-the-fly, ...
expand
Collaborative personalized tweet recommendation
Kailong Chen, Tianqi Chen, Guoqing Zheng, Ou Jin, Enpeng Yao, Yong Yu
Pages: 661-670
doi>10.1145/2348283.2348372
Full text: PDFPDF

Twitter has rapidly grown to a popular social network in recent years and provides a large number of real-time messages for users. Tweets are presented in chronological order and users scan the followees' timelines to find what they are interested in. ...
expand
Exploring social influence for recommendation: a generative model approach
Mao Ye, Xingjie Liu, Wang-Chien Lee
Pages: 671-680
doi>10.1145/2348283.2348373
Full text: PDFPDF

Social friendship has been shown beneficial for item recommendation for years. However, existing approaches mostly incorporate social friendship into recommender systems by heuristics. In this paper, we argue that social influence between ...
expand
SESSION: Multimedia 3
See-to-retrieve: efficient processing of spatio-visual keyword queries
Chao Zhang, Lidan Shou, Ke Chen, Gang Chen
Pages: 681-690
doi>10.1145/2348283.2348375
Full text: PDFPDF

The wide proliferation of powerful smart phones equipped with multiple sensors, 3D graphical engine, and 3G connection has nurtured the creation of a new spectrum of visual mobile applications. These applications require novel data retrieval techniques ...
expand
Placing images on the world map: a microblog-based enrichment approach
Claudia Hauff, Geert-Jan Houben
Pages: 691-700
doi>10.1145/2348283.2348376
Full text: PDFPDF

Estimating the geographic location of images is a task which has received increasing attention recently. Large numbers of images uploaded to platforms such as Flickr do not contain GPS-based latitude/longitude coordinates. Obtaining such geographic information ...
expand
Where is who: large-scale photo retrieval by facial attributes and canvas layout
Yu-Heng Lei, Yan-Ying Chen, Bor-Chun Chen, Lime Iida, Winston H. Hsu
Pages: 701-710
doi>10.1145/2348283.2348377
Full text: PDFPDF

The ubiquitous availability of digital cameras has made it easier than ever to capture moments of life, especially the ones accompanied with friends and family. It is generally believed that most family photos are with faces that are sparsely tagged. ...
expand
SESSION: Entities
Mining the web for points of interest
Adam Rae, Vanessa Murdock, Adrian Popescu, Hugues Bouchard
Pages: 711-720
doi>10.1145/2348283.2348379
Full text: PDFPDF

A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose ...
expand
TwiNER: named entity recognition in targeted twitter stream
Chenliang Li, Jianshu Weng, Qi He, Yuxia Yao, Anwitaman Datta, Aixin Sun, Bu-Sung Lee
Pages: 721-730
doi>10.1145/2348283.2348380
Full text: PDFPDF

Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets ...
expand
Adaptive context features for toponym resolution in streaming news
Michael D. Lieberman, Hanan Samet
Pages: 731-740
doi>10.1145/2348283.2348381
Full text: PDFPDF

News sources around the world generate constant streams of information, but effective streaming news retrieval requires an intimate understanding of the geographic content of news. This process of understanding, known as geotagging, consists of first ...
expand
SESSION: Learning to rank
Structural relationships for large-scale learning of answer re-ranking
Aliaksei Severyn, Alessandro Moschitti
Pages: 741-750
doi>10.1145/2348283.2348383
Full text: PDFPDF

Supervised learning applied to answer re-ranking can highly improve on the overall accuracy of question answering (QA) systems. The key aspect is that the relationships and properties of the question/answer pair composed of a question and the supporting ...
expand
Top-k learning to rank: labeling, ranking and evaluation
Shuzi Niu, Jiafeng Guo, Yanyan Lan, Xueqi Cheng
Pages: 751-760
doi>10.1145/2348283.2348384
Full text: PDFPDF

In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficulty in obtaining reliable relevance judgments from human assessors when applying ...
expand
Robust ranking models via risk-sensitive optimization
Lidan Wang, Paul N. Bennett, Kevyn Collins-Thompson
Pages: 761-770
doi>10.1145/2348283.2348385
Full text: PDFPDF

Many techniques for improving search result quality have been proposed. Typically, these techniques increase average effectiveness by devising advanced ranking features and/or by developing sophisticated learning to rank algorithms. However, while these ...
expand
SESSION: Community QA
Dual role model for question recommendation in community question answering
Fei Xu, Zongcheng Ji, Bin Wang
Pages: 771-780
doi>10.1145/2348283.2348387
Full text: PDFPDF

Question recommendation that automatically recommends a new question to suitable users to answer is an appealing and challenging problem in the research area of Community Question Answering (CQA). Unlike in general recommender systems where a user has ...
expand
Vote calibration in community question-answering systems
Bee-Chung Chen, Anirban Dasgupta, Xuanhui Wang, Jie Yang
Pages: 781-790
doi>10.1145/2348283.2348388
Full text: PDFPDF

User votes are important signals in community question-answering (CQA) systems. Many features of typical CQA systems, e.g. the best answer to a question, status of a user, are dependent on ratings or votes cast by the community. In a popular CQA site, ...
expand
Category hierarchy maintenance: a data-driven approach
Quan Yuan, Gao Cong, Aixin Sun, Chin-Yew Lin, Nadia Magnenat Thalmann
Pages: 791-800
doi>10.1145/2348283.2348389
Full text: PDFPDF

Category hierarchies often evolve at a much slower pace than the documents reside in. With newly available documents kept adding into a hierarchy, new topics emerge and documents within the same category become less topically cohesive. In this paper, ...
expand
When web search fails, searchers become askers: understanding the transition
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Yoelle Maarek, Idan Szpektor
Pages: 801-810
doi>10.1145/2348283.2348390
Full text: PDFPDF

While Web search has become increasingly effective over the last decade, for many users' needs the required answers may be spread across many documents, or may not exist on the Web at all. Yet, many of these needs could be addressed by asking people ...
expand
SESSION: Federated search
Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points
Makoto P. Kato, Hiroaki Ohshima, Katsumi Tanaka
Pages: 811-820
doi>10.1145/2348283.2348392
Full text: PDFPDF

We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user ...
expand
Mixture model with multiple centralized retrieval algorithms for result merging in federated search
Dzung Hong, Luo Si
Pages: 821-830
doi>10.1145/2348283.2348393
Full text: PDFPDF

Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised ...
expand
Reactive index replication for distributed search engines
Flavio P. Junqueira, Vincent Leroy, Matthieu Morel
Pages: 831-840
doi>10.1145/2348283.2348394
Full text: PDFPDF

Distributed search engines comprise multiple sites deployed across geographically distant regions, each site being specialized to serve the queries of local users. When a search site cannot accurately compute the results of a query, it must forward the ...
expand
SESSION: Diversity 2
Personalized diversification of search results
David Vallet, Pablo Castells
Pages: 841-850
doi>10.1145/2348283.2348396
Full text: PDFPDF

Search personalization and diversification are often seen as opposing alternatives to cope with query uncertainty, where, given an ambiguous query, it is either preferable to adapt the search result to a specific aspect that may interest the user (personalization) ...
expand
Combining implicit and explicit topic representations for result diversification
Jiyin He, Vera Hollink, Arjen de Vries
Pages: 851-860
doi>10.1145/2348283.2348397
Full text: PDFPDF

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., ...
expand
Using preference judgments for novel document retrieval
Praveen Chandar, Ben Carterette
Pages: 861-870
doi>10.1145/2348283.2348398
Full text: PDFPDF

There has been considerable interest in incorporating diversity in search results to account for redundancy and the space of possible user needs. Most work on this problem is based on subtopics: diversity rankers score documents against a set ...
expand
SESSION: Evaluation 2
Quality through flow and immersion: gamifying crowdsourced relevance assessments
Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, Padmini Srinivasan
Pages: 871-880
doi>10.1145/2348283.2348400
Full text: PDFPDF

Crowdsourcing is a market of steadily-growing importance upon which both academia and industry increasingly rely. However, this market appears to be inherently infested with a significant share of malicious workers who try to maximise their profits through ...
expand
An IR-based evaluation framework for web search query segmentation
Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, Srivatsan Laxman
Pages: 881-890
doi>10.1145/2348283.2348401
Full text: PDFPDF

This paper presents the first evaluation framework for Web search query segmentation based directly on IR performance. In the past, segmentation strategies were mainly validated against manual annotations. Our work shows that the goodness of a segmentation ...
expand
On per-topic variance in IR evaluation
Stephen E. Robertson, Evangelos Kanoulas
Pages: 891-900
doi>10.1145/2348283.2348402
Full text: PDFPDF

We explore the notion, put forward by Cormack & Lynam and Robertson, that we should consider a document collection used for Cranfield-style experiments as a sample from some larger population of documents. In this view, any per-topic metric (such ...
expand
An uncertainty-aware query selection model for evaluation of IR systems
Mehdi Hosseini, Ingemar J. Cox, Natasa Milic-Frayling, Milad Shokouhi, Emine Yilmaz
Pages: 901-910
doi>10.1145/2348283.2348403
Full text: PDFPDF

We propose a mathematical framework for query selection as a mechanism for reducing the cost of constructing information retrieval test collections. In particular, our mathematical formulation explicitly models the uncertainty in the retrieval effectiveness ...
expand
SESSION: Representation
Improving retrieval of short texts through document expansion
Miles Efron, Peter Organisciak, Katrina Fenlon
Pages: 911-920
doi>10.1145/2348283.2348405
Full text: PDFPDF

Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach ...
expand
Extending BM25 with multiple query operators
Roi Blanco, Paolo Boldi
Pages: 921-930
doi>10.1145/2348283.2348406
Full text: PDFPDF

Traditional probabilistic relevance frameworks for informational retrieval refrain from taking positional information into account, due to the hurdles of developing a sound model while avoiding an explosion in the number of parameters. Nonetheless, the ...
expand
Rhetorical relations for information retrieval
Christina Lioma, Birger Larsen, Wei Lu
Pages: 931-940
doi>10.1145/2348283.2348407
Full text: PDFPDF

Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are ...
expand
Modeling higher-order term dependencies in information retrieval using query hypergraphs
Michael Bendersky, W. Bruce Croft
Pages: 941-950
doi>10.1145/2348283.2348408
Full text: PDFPDF

Many of the recent, and more effective, retrieval models have incorporated dependencies between the terms in the query. In this paper, we advance this query representation one step further, and propose a retrieval framework that models higher-order term ...
expand
SESSION: Classification
Confidence-aware graph regularization with heterogeneous pairwise features
Yuan Fang, Bo-June (Paul) Hsu, Kevin Chen-Chuan Chang
Pages: 951-960
doi>10.1145/2348283.2348410
Full text: PDFPDF

Conventional classification methods tend to focus on features of individual objects, while missing out on potentially valuable pairwise features that capture the relationships between objects. Although recent developments on graph regularization exploit ...
expand
A utility-theoretic ranking method for semi-automated text classification
Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani
Pages: 961-970
doi>10.1145/2348283.2348411
Full text: PDFPDF

In Semi-Automated Text Classification (SATC) an automatic classifier F labels a set of unlabelled documents D, following which a human annotator inspects (and corrects when appropriate) the labels attributed by F to a subset D' of D, with the aim of ...
expand
Improving tweet stream classification by detecting changes in word probability
Kyosuke Nishida, Takahide Hoshide, Ko Fujimura
Pages: 971-980
doi>10.1145/2348283.2348412
Full text: PDFPDF

We propose a classification model of tweet streams in Twitter, which are representative of document streams whose statistical properties will change over time. Our model solves several problems that hinder the classification of tweets; in particular, ...
expand
Predicting quality flaws in user-generated content: the case of wikipedia
Maik Anderka, Benno Stein, Nedim Lipka
Pages: 981-990
doi>10.1145/2348283.2348413
Full text: PDFPDF

The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated ...
expand
SESSION: Doctoral submissions
A knowledge-based approach for summarising opinions
Marco Bonzanini
Pages: 991-991
doi>10.1145/2348283.2348415
Full text: PDFPDF

Automatic Document Summarisation plays a central role in the process of providing the user with a quick access to information. Applications range from the generation of news headlines, to the aggregation of opinions extracted from reviews. Traditional ...
expand
Adaptive IR for exploratory search support
Daniel T.J. Backhausen
Pages: 992-992
doi>10.1145/2348283.2348416
Full text: PDFPDF

Most Information Retrieval (IR) software is designed to fit a general user where users are submitting queries and the retrieval system returns a ranked list of results. Regardless of the user, the query always returns the same list of results. Individual ...
expand
Adversarial content manipulation effects
Fiana Raiber
Pages: 993-993
doi>10.1145/2348283.2348417
Full text: PDFPDF

We address a question that has been somewhat overlooked throughout the transition from classical ad hoc retrieval to Web search: how is the performance of classical retrieval approaches affected by the presence of content manipulation? Our initial experiments ...
expand
Building reputation and trust using federated search and opinion mining
Somayeh Khatiban
Pages: 994-994
doi>10.1145/2348283.2348418
Full text: PDFPDF

The term online reputation addresses trust relationships amongst agents in dynamic open systems. These can appear as ratings, recommendations, referrals and feedback. Several reputation models and rating aggregation algorithms have been proposed. However, ...
expand
Enhancing knowledge base with knowledge transfer
Si-Chi Chin
Pages: 995-995
doi>10.1145/2348283.2348419
Full text: PDFPDF

A Knowledge Base (KB) stores, organizes, and shares information pertinent to entities (i.e. KB nodes) such as people, organizations, and events. A large KB system, such as Wikipedia, relies on human curators to create and maintain the content in the ...
expand
Improving e-discovery using information retrieval
Kripabandhu Ghosh
Pages: 996-996
doi>10.1145/2348283.2348420
Full text: PDFPDF

E-discovery is the requirement that the documents and information in electronic form stored in corporate systems be produced as evidence in litigation. It has posed great challenges for legal experts. Legal searchers have always looked to find "any and ...
expand
Opinion influence and diffusion in social network
Dehong Gao
Pages: 997-997
doi>10.1145/2348283.2348421
Full text: PDFPDF

Nowadays, more and more people tend to make decisions based on the opinion information from the Internet, in addition to recommendations from offline friends or parents. For example, we may browse the resumes and comments on election candidates to determine ...
expand
Relevance as a subjective and situational multidimensional concept
Carsten Eickhoff
Pages: 998-998
doi>10.1145/2348283.2348422
Full text: PDFPDF

Relevance is the central concept of information retrieval. Although its important role is unanimously accepted among researchers, numerous different definitions of the term have emerged over the years. Considerable effort has been put into creating consistent ...
expand
Exploiting temporal topic models in social media retrieval
Tuan A. Tran
Pages: 999-999
doi>10.1145/2348283.2348423
Full text: PDFPDF

Many of user generated contents in the Web 2.0 center around real-world incidents such as Japanese tsunami, or general concerns such as recent economic downturn. Such type of information is always of interest to users. For instance, when a user reads ...
expand
The essence of time: considering temporal relevance as an intent-aware ranking problem
Stewart Whiting
Pages: 1000-1000
doi>10.1145/2348283.2348424
Full text: PDFPDF

Real-time news and social media quickly reflect large-scale phenomena and events. As users become exposed to this information, time plays a central role in prompting both information authorship and seeking activities. The objective of this research is ...
expand
DEMONSTRATION SESSION: Demonstrations
A framework for manipulating and searching multiple retrieval types
Marc-Allen Cartright, Ethem F. Can, William Dabney, Jeff Dalton, Logan Giorda, Kriste Krstovski, Xiaoye Wu, Ismet Zeki Yalniz, James Allan, R. Manmatha, David A. Smith
Pages: 1001-1001
doi>10.1145/2348283.2348426
Full text: PDFPDF

Conventional retrieval systems view documents as a unit and look at different retrieval types within a document. We introduce Proteus, a frame-work for seamlessly navigating books as dynamic collections which are defined on the fly. Proteus allows us ...
expand
A visual tool for bayesian data analysis: the impact of smoothing on naive bayes text classifiers
Giorgio Maria Di Nunzio, Alessandro Sordoni
Pages: 1002-1002
doi>10.1145/2348283.2348427
Full text: PDFPDF

Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations ...
expand
ALF: a client side logger and server for capturing user interactions in web applications
Leif Azzopardi, Myles Doolan, Richard Glassey
Pages: 1003-1003
doi>10.1145/2348283.2348428
Full text: PDFPDF

This demonstration paper introduces ALF which provides a light-weight client side logging application and a server for collecting user interaction data. ALF has been designed as a loosely coupled independent service that runs in parallel with the IR ...
expand
ChatNoir: a search engine for the ClueWeb09 corpus
Martin Potthast, Matthias Hagen, Benno Stein, Jan Graßegger, Maximilian Michel, Martin Tippmann, Clement Welsch
Pages: 1004-1004
doi>10.1145/2348283.2348429
Full text: PDFPDF

We present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F information ...
expand
CrowdTerrier: automatic crowdsourced relevance assessments with terrier
Richard McCreadie, Craig Macdonald, Iadh Ounis
Pages: 1005-1005
doi>10.1145/2348283.2348430
Full text: PDFPDF

In this demo, we present CrowdTerrier, an infrastructure extension to the open source Terrier IR platform that enables the semi-automatic generation of relevance assessments for a variety of document ranking tasks using crowdsourcing. The aim of CrowdTerrier ...
expand
Distilling and exploring nuggets from a corpus
Vittorio Castelli, Hema Raghavan, Radu Florian, Ding-Jung Han, Xiaoqiang Luo, Salim Roukos
Pages: 1006-1006
doi>10.1145/2348283.2348431
Full text: PDFPDF

This paper describes a live and scalable system that automatically extracts information nuggets for entities/topics from a continuously updated corpus for effective exploration and analysis. A nugget is a piece of semantic information that (1) must be ...
expand
Integrative online research-data management
Michael Huggett, Edie Rasmussen
Pages: 1007-1007
doi>10.1145/2348283.2348432
Full text: PDFPDF

In support of our research projects in information retrieval, we have developed an integrated multi-process software system that shepherds research data from induction through aggregation, analysis, and presentation. We combine public-domain code libraries ...
expand
MaSe: create your own mash-up search interface
Leif Azzopardi, Douglas Dowie, Kelly Ann Marshall, Richard Glassey
Pages: 1008-1008
doi>10.1145/2348283.2348433
Full text: PDFPDF

MaSe provides a sandbox environment for high school students to create their own personalised search interface. It has been designed with two major goals in mind: (1) as a hands-on tutorial for school children, to excite them about programming and computing ...
expand
myDJ: recommending karaoke songs from one's own voice
Kuang Mao, Xinyuan Luo, Ke Chen, Gang Chen, Lidan Shou
Pages: 1009-1009
doi>10.1145/2348283.2348434
Full text: PDFPDF

In this demo, we present myDJ, a karaoke recommendation system which recommends the songs people are capable to sing. Different from the existing song recommendation systems which recommend songs people like to listen, myDJ can recommend proper songs ...
expand
PageFetch: a retrieval game for children (and adults)
Leif Azzopardi, Jim Purvis, Richard Glassey
Pages: 1010-1010
doi>10.1145/2348283.2348435
Full text: PDFPDF

Children often struggle with information retrieval tasks as searching for information often requires a developed vocabulary and strong categorisation skills; neither of which are particularly developed in children under the age of 12. In a study conducted ...
expand
Pictune: situational music recommendation from geotagged pictures
Ke Chen, Gang Chen, Lidan Shou, Fei Xia
Pages: 1011-1011
doi>10.1145/2348283.2348436
Full text: PDFPDF
Political search trends
Ingmar Weber, Venkata Rama Kiran Garimella, Erik Borra
Pages: 1012-1012
doi>10.1145/2348283.2348437
Full text: PDFPDF

We present Political Search Trends, a browser based web search analysis tool that (i) assigns a political leaning to web search queries, (ii) detects trending political queries in a given week, and (iii) links search queries to fact-checked statements. ...
expand
RDF Xpress: a flexible expressive RDF search engine
Shady Elbassuoni, Maya Ramanath, Gerhard Weikum
Pages: 1013-1013
doi>10.1145/2348283.2348438
Full text: PDFPDF

We demonstrate RDF Xpress, a search engine that enables users to effectively retrieve information from large RDF knowledge bases or Linked Data Sources. RDF Xpress provides a search interface where users can combine triple patterns with keywords to form ...
expand
Sketch-based image similarity search with a pen and paper interface
Ihab Al Kabary, Heiko Schuldt
Pages: 1014-1014
doi>10.1145/2348283.2348439
Full text: PDFPDF

We present a novel and innovative user interface for query-by-sketching based image retrieval that exploits emergent interactive paper and digital pen technology. Users can draw sketches with a digital pen on interactive paper in a user-friendly way. ...
expand
Task-aware search assistant
Henry Allen Feild, James Allan
Pages: 1015-1015
doi>10.1145/2348283.2348440
Full text: PDFPDF
TweetSpector: entity-based retrieval of tweets
Surender Reddy Yerva, Zoltan Miklos, Flavia Grosan, Alexandru Tandrau, Karl Aberer
Pages: 1016-1016
doi>10.1145/2348283.2348441
Full text: PDFPDF

TweetSpector is a tool for demonstrating entity-based of retrieval of tweets. The various features of this tool include: entity profile creation, real-time tweet classification, active improvement of the created profiles through user feedback, and the ...
expand
YooSee: a video browsing application for young children
Leif Azzopardi, Douglas Dowie, Kelly Ann Marshall
Pages: 1017-1017
doi>10.1145/2348283.2348442
Full text: PDFPDF

Nowadays children as young as two years old can easily interact with mobile touch screen devices and personal computers to watch online videos through services such as YouTube. However, such services present a number of challenges for young children ...
expand
Multi-platform image search using tag enrichment
Jinming Min, Cristover Lopes, Johannes Leveling, Dag Schmidtke, Gareth J.F. Jones
Pages: 1018-1018
doi>10.1145/2348283.2348443
Full text: PDFPDF

The number of images available online is growing steadily and current web search engines have indexed more than 10 billion images. Approaches to image retrieval are still often text-based and operate on image annotations and captions. Image annotations ...
expand
SESSION: Industry talk abstracts
IR paradigms in computational advertising
Andrei Z. Broder
Pages: 1019-1019
doi>10.1145/2348283.2348445
Full text: PDFPDF

The central problem in the emerging discipline of computational advertising is to find the "best match" between a given user in a given context and a suitable advertisement. The context could be a user entering a query in a search engine ("sponsored ...
expand
Watson: the Jeopardy! challenge and beyond
Eric W. Brown
Pages: 1020-1020
doi>10.1145/2348283.2348446
Full text: PDFPDF

Watson, named after IBM founder Thomas J. Watson, was built by a team of IBM researchers who set out to accomplish a grand challenge-build a computing system that rivals a human's ability to answer questions posed in natural language with speed, accuracy ...
expand
Putting context into search and search into context
Susan T. Dumais
Pages: 1021-1021
doi>10.1145/2348283.2348447
Full text: PDFPDF

It is very challenging task to understand a short query, especially if that query is considered in isolation. Luckily, queries do magically appear in a search box -- rather, they are issued by real people, trying to accomplish a task, at a given point ...
expand
CloudSearch and the democratization of information retrieval
Daniel E. Rose
Pages: 1022-1023
doi>10.1145/2348283.2348448
Full text: PDFPDF

Amazon CloudSearch is a new hosted search service, built on top of many cloud-based AWS services, and based on the same technology that powers search on Amazon's retail sites. Because of its ease of configuration and scalability, CloudSearch represents ...
expand
Entity sentiment extraction using text ranking
John O'Neil
Pages: 1024-1024
doi>10.1145/2348283.2348449
Full text: PDFPDF

Entity extraction and sentiment classification are among the most common types of information derived from documents, but the problem of directly associating entities and sentiment has received less attention. We use TextRank on a graph linking entities ...
expand
POSTER SESSION: Poster abstracts
A hybrid model for ad-hoc information retrieval
Zheng Ye, Jimmy Xiangji Huang, Jun Miao
Pages: 1025-1026
doi>10.1145/2348283.2348451
Full text: PDFPDF

Many information retrieval (IR) techniques have been proposed to improve the performance, and some combinations of these techniques has been demonstrated to be effective. However, how to effectively combine them is largely unexplored. It is possible ...
expand
Exploiting paths for entity search in RDF graphs
Minsuk Kahng, Sang-goo Lee
Pages: 1027-1028
doi>10.1145/2348283.2348452
Full text: PDFPDF

The field of entity search using Semantic Web (RDF) data has gained more interest recently. In this paper, we propose a probabilistic entity retrieval model for RDF graphs using paths in the graph. Unlike previous work which assumes that all descriptions ...
expand
A study of term weighting schemes using class information for text classification
Youngjoong Ko
Pages: 1029-1030
doi>10.1145/2348283.2348453
Full text: PDFPDF
A topic model of clinical reports
Corey Arnold, William Speier
Pages: 1031-1032
doi>10.1145/2348283.2348454
Full text: PDFPDF

Clinical narrative in the medical record provides perhaps the most detailed account of a patient's history. However, this information is documented in free-text, which makes it challenging to analyze. Efforts to index unstructured clinical narrative ...
expand
Active query selection for learning rankers
Mustafa Bilgic, Paul N. Bennett
Pages: 1033-1034
doi>10.1145/2348283.2348455
Full text: PDFPDF

Methods that reduce the amount of labeled data needed for training have focused more on selecting which documents to label than on which queries should be labeled. One exception to this (Long et al. 2010) uses expected loss optimization (ELO) to estimate ...
expand
Anticipatory search: using context to initiate search
Daniel J. Liebling, Paul N. Bennett, Ryen W. White
Pages: 1035-1036
doi>10.1145/2348283.2348456
Full text: PDFPDF

Identifying content for which a user may search has a variety of applications, including ranking and recommendation. In this poster, we examine how pre-search context can be used to predict content that the user will seek before they have even specified ...
expand
BReK12: a book recommender for K-12 users
Maria Soledad Pera, Yiu-Kai Ng
Pages: 1037-1038
doi>10.1145/2348283.2348457
Full text: PDFPDF

Ideally, students in K-12 grade levels can turn to book recommenders to locate books that match their interests. Existing book recommenders, however, fail to take into account the readability levels of their users, and hence their recommendations may ...
expand
Clarity re-visited
Shay Hummel, Anna Shtok, Fiana Raiber, Oren Kurland, David Carmel
Pages: 1039-1040
doi>10.1145/2348283.2348458
Full text: PDFPDF

We present a novel interpretation of Clarity [5], a widely used query performance predictor. While Clarity is commonly described as a measure of the "distance" between the language model of the top-retrieved documents and that of the collection, we show ...
expand
Cluster-based one-class ensemble for classification problems in information retrieval
Nedim Lipka, Benno Stein, Maik Anderka
Pages: 1041-1042
doi>10.1145/2348283.2348459
Full text: PDFPDF

A number of relevant information retrieval classification problems are one-class classification problems at heart. I.e., labeled data is only available for one class, the so-called target class, and common discrimination-based classification approaches, ...
expand
Collaborative filtering with short term preferences mining
Diyi Yang, Tianqi Chen, Weinan Zhang, Yong Yu
Pages: 1043-1044
doi>10.1145/2348283.2348460
Full text: PDFPDF

Recently, recommender systems have fascinated researchers and benefited a variety of people's online activities, enabling users to survive the explosive web information. Traditional collaborative filtering techniques handle the general recommendation ...
expand
Creating temporally dynamic web search snippets
Krysta M. Svore, Jaime Teevan, Susan T. Dumais, Anagha Kulkarni
Pages: 1045-1046
doi>10.1145/2348283.2348461
Full text: PDFPDF

Content on the Internet is always changing. We explore the value of biasing search result snippets towards new webpage content. We present results from a user study comparing traditional query-focused snippets with snippets that emphasize new page content ...
expand
Dependency trigram model for social relation extraction from news articles
Maengsik Choi, Harksoo Kim, Bruce W. Croft
Pages: 1047-1048
doi>10.1145/2348283.2348462
Full text: PDFPDF

We propose a kernel-based model to automatically extract social relations such as economic relations and political relations between two people from news articles. To determine whether two people are structurally associated with each other, the proposed ...
expand
Detecting candidate named entities in search queries
Areej Alasiry, Mark Levene, Alexandra Poulovassilis
Pages: 1049-1050
doi>10.1145/2348283.2348463
Full text: PDFPDF

The information extraction task of Named Entities Recognition (NER) has been recently applied to search engine queries, in order to better understand their semantics. Here we concentrate on the task prior to the classification of the named entities ...
expand
Effect of dynamic pruning safety on learning to rank effectiveness
Craig Macdonald, Nicola Tonellotto, Iadh Ounis
Pages: 1051-1052
doi>10.1145/2348283.2348464
Full text: PDFPDF

A dynamic pruning strategy, such as WAND, enhances retrieval efficiency without degrading effectiveness to a given rank K, known as safe-to-rank-K. However, it is also possible for WAND to obtain more efficient but unsafe retrieval without actually significantly ...
expand
Effect of written instructions on assessor agreement
William Webber, Bryan Toth, Marjorie Desamito
Pages: 1053-1054
doi>10.1145/2348283.2348465
Full text: PDFPDF

Assessors frequently disagree on the topical relevance of documents. How much of this disagreement is due to ambiguity in assessment instructions? We have two assessors assess TREC Legal Track documents for relevance, some to a general topic description, ...
expand
Effects of expertise differences in synchronous social Q&A
Ryen W. White, Matthew Richardson
Pages: 1055-1056
doi>10.1145/2348283.2348466
Full text: PDFPDF

Synchronous social question-and-answer (Q&A) systems match askers to answerers and support real-time dialog between them to resolve questions. These systems typically find answerers based on the degree of expertise match with the asker's initial ...
expand
Efficient estimation of aspect weights
Jon Parker, Andrew Yates, Nazli Goharian, Wai Gen Yee
Pages: 1057-1058
doi>10.1145/2348283.2348467
Full text: PDFPDF

Many websites encourage people to submit reviews of various products and services. We present and evaluate a novel approach to efficiently model and analyze the text within user reviews to estimate how much reviewers care about different aspects of a ...
expand
Emotion tagging for comments of online news by meta classification with heterogeneous information sources
Ying Zhang, Yi Fang, Xiaojun Quan, Lin Dai, Luo Si, Xiaojie Yuan
Pages: 1059-1060
doi>10.1145/2348283.2348468
Full text: PDFPDF

With the rapid growth of online news services, users can actively respond to online news by making comments. Users often express subjective emotions in comments such as sadness, surprise and anger. Such emotions can help understand the preferences and ...
expand
Estimating the magic barrier of recommender systems: a user study
Alan Said, Brijnesh J. Jain, Sascha Narr, Till Plumbaum, Sahin Albayrak, Christian Scheel
Pages: 1061-1062
doi>10.1145/2348283.2348469
Full text: PDFPDF

Recommender systems are commonly evaluated by trying to predict known, withheld, ratings for a set of users. Measures such as the Root-Mean-Square Error are used to estimate the quality of the recommender algorithms. This process does however not acknowledge ...
expand
Explaining neighborhood-based recommendations
Sergio Cleger-Tamayo, Juan M. Fernandez-Luna, Juan F. Huete
Pages: 1063-1064
doi>10.1145/2348283.2348470
Full text: PDFPDF

Recommender Systems (RS) attempt to discover users' preferences, and to learn about them in order to anticipate their needs. The main task normally associated with a RS is to offer suggestions for items. However, for most users, RSs are black boxes, ...
expand
Exploiting term dependence while handling negation in medical search
Nut Limsopatham, Craig Macdonald, Richard McCreadie, Iadh Ounis
Pages: 1065-1066
doi>10.1145/2348283.2348471
Full text: PDFPDF

In medical records, negative qualifiers, e.g. no or without, are commonly used by health practitioners to identify the absence of a medical condition. Without considering whether the term occurs in a negative or positive context, the sole presence of ...
expand
Exploring example-based person search in email
Tan Xu, Douglas W. Oard
Pages: 1067-1068
doi>10.1145/2348283.2348472
Full text: PDFPDF

This paper describes an entity ranking model for example-based person search in email. Evaluation by comparison to manually resolved named references in Enron email yield results that correspond to typically placing the correct entity in the first or ...
expand
Exploring tag relevance for image tag re-ranking
Jie Xiao, Wengang Zhou, Qi Tian
Pages: 1069-1070
doi>10.1145/2348283.2348473
Full text: PDFPDF

In this paper, we propose to explore the relevance between tags for image tag re-ranking. The key component is to define a global tag-tag similarity matrix, which is achieved by analysis in both semantic and visual aspects. The text semantic relevance ...
expand
Fast on-line learning for multilingual categorization
Michelle Kovesi, Cyril Goutte, Massih-Reza Amini
Pages: 1071-1072
doi>10.1145/2348283.2348474
Full text: PDFPDF

Multiview learning has been shown to be a natural and efficient framework for supervised or semi-supervised learning of multilingual document categorizers. The state-of-the-art co-regularization approach relies on alternate minimizations of a combination ...
expand
Finding interesting posts in Twitter based on retweet graph analysis
Min-Chul Yang, Jung-Tae Lee, Seung-Wook Lee, Hae-Chang Rim
Pages: 1073-1074
doi>10.1145/2348283.2348475
Full text: PDFPDF

Millions of posts are being generated in real-time by users in social networking services, such as Twitter. However, a considerable number of those posts are mundane posts that are of interest to the authors and possibly their friends only. This paper ...
expand
Finding readings for scientists from social websites
Jiepu Jiang, Zhen Yue, Shuguang Han, Daqing He
Pages: 1075-1076
doi>10.1145/2348283.2348476
Full text: PDFPDF

Current search systems are designed to find relevant articles, especially topically relevant ones, but the notion of relevance largely depends on search tasks. We study the specific task that scientists are searching for worth-reading articles beneficial ...
expand
Finding web appearances of social network users via latent factor model
Kailong Chen, Zhengdong Lu, Xiaoshi Yin, Yong Yu, Zaiqing Nie
Pages: 1077-1078
doi>10.1145/2348283.2348477
Full text: PDFPDF

With the rapid growing of Web 2.0, people spend more time on social networks such as Facebook and Twitter. In order to know the people they are interacting with, finding the web appearances of them will help the social network users to a great extent. ...
expand
Fixed versus dynamic co-occurrence windows in TextRank term weights for information retrieval
Wei Lu, Qikai Cheng, Christina Lioma
Pages: 1079-1080
doi>10.1145/2348283.2348478
Full text: PDFPDF

TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window of ...
expand
Gender-aware re-ranking
Eugene Kharitonov, Pavel Serdyukov
Pages: 1081-1082
doi>10.1145/2348283.2348479
Full text: PDFPDF

In this paper we study usefulness of users' gender information for improving ranking of ambiguous queries in personalized and non-contextual settings. This study is performed as a sequence of offline re-ranking experiments and it demonstrates that the ...
expand
Genre classification for million song dataset using confidence-based classifiers combination
Yajie Hu, Mitsunori Ogihara
Pages: 1083-1084
doi>10.1145/2348283.2348480
Full text: PDFPDF

We proposed a method to classify songs in the Million Song Dataset according to song genre. Since songs have several data types, we trained sub-classifiers by different types of data. These sub-classifiers are combined using both classifier authority ...
expand
GLASE 0.1: eyes tell more than mice
Viktors Garkavijs, Mayumi Toshima, Noriko Kando
Pages: 1085-1086
doi>10.1145/2348283.2348481
Full text: PDFPDF

This paper proposes a prototype system called Gaze-Learning-Access-and-Search-Engine 0.1 (GLASE), which can perform image relevance ranking based on gaze data and within-session learning. We developed a search user interface that uses an eye-tracker ...
expand
How query extensions reflect search result abandonments
Aleksandr Chuklin, Pavel Serdyukov
Pages: 1087-1088
doi>10.1145/2348283.2348482
Full text: PDFPDF

It is often considered that high abandonment rate corresponds to poor IR system performance. However several studies suggested that there are so called good abandonments, i.e. situations when search engine result page contains enough details to ...
expand
Identifying entity aspects in microblog posts
Damiano Spina, Edgar Meij, Maarten de Rijke, Andrei Oghina, Minh Thuong Bui, Mathias Breuss
Pages: 1089-1090
doi>10.1145/2348283.2348483
Full text: PDFPDF

Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying "aspects" of the entity of interest (such as products, services, competitors, key ...
expand
Impact of assessor disagreement on ranking performance
Pavel Metrikov, Virgil Pavlu, Javed A. Aslam
Pages: 1091-1092
doi>10.1145/2348283.2348484
Full text: PDFPDF

We consider the impact of inter-assessor disagreement on the maximum performance that a ranker can hope to achieve. We demonstrate that even if a ranker were to achieve perfect performance with respect to a given assessor, when evaluated with respect ...
expand
Incorporating statistical topic information in relevance feedback
Karla L. Caballero, Ram Akella
Pages: 1093-1094
doi>10.1145/2348283.2348485
Full text: PDFPDF

Most of the relevance feedback algorithms only use document terms as feedback (local features) in order to update the query and re-rank the documents to show to the user. This approach is limited by the terms of those documents without any global context. ...
expand
Inferring missing relevance judgments from crowd workers via probabilistic matrix factorization
Hyun Joon Jung, Matthew Lease
Pages: 1095-1096
doi>10.1145/2348283.2348486
Full text: PDFPDF

In crowdsourced relevance judging, each crowd worker typically judges only a small number of examples, yielding a sparse and imbalanced set of judgments in which relatively few workers influence output consensus labels, particularly with simple consensus ...
expand
Investigating performance predictors using monte carlo simulation and score distribution models
Ronan Cummins
Pages: 1097-1098
doi>10.1145/2348283.2348487
Full text: PDFPDF

The standard deviation of scores in the top k documents of a ranked list has been shown to be significantly correlated with average precision and has been the basis of a number of query performance predictors. In this paper, we outline two hypotheses ...
expand
Learning to select a time-aware retrieval model
Nattiya Kanhabua, Klaus Berberich, Kjetil Nørvåg
Pages: 1099-1100
doi>10.1145/2348283.2348488
Full text: PDFPDF

Time-aware retrieval models exploit one of two time dimensions, namely, (a) publication time or (b) content time (temporal expressions mentioned in documents). We show that the effectiveness for a temporal query (e.g., illinois earthquake ...
expand
Learning-based time-sensitive re-ranking for web search
Po-Tzu Chang, Yen-Chieh Huang, Cheng-Lun Yang, Shou-De Lin, Pu-Jen Cheng
Pages: 1101-1102
doi>10.1145/2348283.2348489
Full text: PDFPDF

To model time-dependent user intent for Web search, this paper proposes a novel method using machine learning techniques to exploit temporal features for effective time-sensitive search result re-ranking. We propose models to incorporate users' click ...
expand
Lightweight contrastive summarization for news comment mining
Gobaan Raveendran, Charles L.A. Clarke
Pages: 1103-1104
doi>10.1145/2348283.2348490
Full text: PDFPDF

We develop and discuss a news comment miner that presents distinct viewpoints on a given theme or event. Given a query, the system uses metasearch techniques to find relevant news articles. Relevant articles are then scraped for both article content ...
expand
Looking inside the box: context-sensitive translation for cross-language information retrieval
Ferhan Ture, Jimmy Lin, Douglas W. Oard
Pages: 1105-1106
doi>10.1145/2348283.2348491
Full text: PDFPDF

Cross-language information retrieval (CLIR) today is dominated by techniques that use token-to-token mappings from bilingual dictionaries. Yet, state-of-the-art statistical translation models (e.g., using Synchronous Context-Free Grammars) are far richer, ...
expand
Making results fit into 40 characters: a study in document rewriting
Johannes Leveling, Gareth J.F. Jones
Pages: 1107-1108
doi>10.1145/2348283.2348492
Full text: PDFPDF

With the increasing popularity of mobile and hand-held devices, automatic approaches for adapting results to the limited screen size of mobile devices are becoming more important. Traditional approaches for reducing the length of textual results include ...
expand
New assessment criteria for query suggestion
Zhongrui Ma, Yu Chen, Ruihua Song, Tetsuya Sakai, Jiaheng Lu, Ji-Rong Wen
Pages: 1109-1110
doi>10.1145/2348283.2348493
Full text: PDFPDF

Query suggestion is a useful tool to help users express their information needs by supplying alternative queries. When evaluating the effectiveness of query suggestion algorithms, many previous studies focus on measuring whether a suggestion query is ...
expand
On automatically tagging web documents from examples
Nicholas Joel Woodward, Weijia Xu, Kent Norsworthy
Pages: 1111-1112
doi>10.1145/2348283.2348494
Full text: PDFPDF

An emerging need in information retrieval is to identify a set of documents conforming to an abstract description. This task presents two major challenges to existing methods of document retrieval and classification. First, similarity based on overall ...
expand
On building a reusable Twitter corpus
Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, Dean McCullough
Pages: 1113-1114
doi>10.1145/2348283.2348495
Full text: PDFPDF

The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms ...
expand
On judgments obtained from a commercial search engine
Emine Yilmaz, Gabriella Kazai, Nick Craswell, Saied Mehrizi Tahaghoghi
Pages: 1115-1116
doi>10.1145/2348283.2348496
Full text: PDFPDF

In information retrieval, relevance judgments play an important role as they are required both for evaluating the quality of retrieval systems and for training learning to rank algorithms. In recent years, numerous papers have been published using judgments ...
expand
On the mathematical relationship between expected n-call@k and the relevance vs. diversity trade-off
Kar Wai Lim, Scott Sanner, Shengbo Guo
Pages: 1117-1118
doi>10.1145/2348283.2348497
Full text: PDFPDF

It has been previously noted that optimization of the n-call@k relevance objective (i.e., a set-based objective that is 1 if at least n documents in a set of k are relevant, otherwise 0) encourages more result set diversification ...
expand
On real-time ad-hoc retrieval evaluation
Stephen E. Robertson, Evangelos Kanoulas
Pages: 1119-1120
doi>10.1145/2348283.2348498
Full text: PDFPDF

Lab-based evaluations typically assess the quality of a retrieval system with respect to its ability to retrieve documents that are relevant to the information need of an end user. In a real-time search task however users not only wish to retrieve the ...
expand
Opinion summarisation through sentence extraction: an investigation with movie reviews
Marco Bonzanini, Miguel Martinez-Alvarez, Thomas Roelleke
Pages: 1121-1122
doi>10.1145/2348283.2348499
Full text: PDFPDF

In on-line reviews, authors often use a short passage to describe the overall feeling about a product or a service. A review as a whole can mention many details not in line with the overall feeling, so capturing this key passage is important to understand ...
expand
Optimizing parameters of the expected reciprocal rank
Yury Logachev, Pavel Serdyukov
Pages: 1123-1124
doi>10.1145/2348283.2348500
Full text: PDFPDF

Most popular IR metrics are parameterized. Usually parameters of these metrics are chosen on the basis of general considerations and not adjusted by experiments with real users. Particularly, the parameters of the Expected Reciprocal Rank measure are ...
expand
Ousting ivory tower research: towards a web framework for providing experiments as a service
Tim Gollub, Benno Stein, Steven Burrows
Pages: 1125-1126
doi>10.1145/2348283.2348501
Full text: PDFPDF

With its close ties to the Web, the IR community is destined to leverage the dissemination and collaboration capabilities that the Web provides today. Especially with the advent of the software as a service principle, an IR community is conceivable that ...
expand
Parallelizing ListNet training using spark
Shilpa Shukla, Matthew Lease, Ambuj Tewari
Pages: 1127-1128
doi>10.1145/2348283.2348502
Full text: PDFPDF

As ever-larger training sets for learning to rank are created, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. Exploiting independence of "summation form" computations, we show how each ...
expand
Predicting lifespans of popular tweets in microblog
Shoubin Kong, Ling Feng, Guozheng Sun, Kan Luo
Pages: 1129-1130
doi>10.1145/2348283.2348503
Full text: PDFPDF

In microblog like Twitter, popular tweets are usually retweeted by many users. For different tweets, their lifespans (i.e., how long they will stay popular) vary. This paper presents a simple yet effective approach to predict the lifespans of ...
expand
Preliminary study of technical terminology for the retrieval of scientific book metadata records
Birger Larsen, Christina Lioma, Ingo Frommholz, Hinrich Schütze
Pages: 1131-1132
doi>10.1145/2348283.2348504
Full text: PDFPDF

Books only represented by brief metadata (book records) are particularly hard to retrieve. One way of improving their retrieval is by extracting retrieval enhancing features from them. This work focusses on scientific (physics) book records. We ...
expand
Queries without clicks: evaluating retrieval effectiveness based on user feedback
Athanasia Koumpouri, Vasiliki Simaki
Pages: 1133-1134
doi>10.1145/2348283.2348505
Full text: PDFPDF

Until recently, the lack of user activity on search results was perceived as a sign of user dissatisfaction from retrieval performance. However, recent studies have reported that some queries might not be followed by clicks to the content of the retrieved ...
expand
Retrieval evaluation on focused tasks
Besnik Fetahu, Ralf Schenkel
Pages: 1135-1136
doi>10.1145/2348283.2348506
Full text: PDFPDF

Ranking of retrieval systems for focused tasks requires large number of relevance judgments. We propose an approach that minimizes the number of relevance judgments, where the performance measures are approximated using a Monte-Carlo sampling technique. ...
expand
Rewarding term location information to enhance probabilistic information retrieval
Jiashu Zhao, Jimmy Xiangji Huang, Shicheng Wu
Pages: 1137-1138
doi>10.1145/2348283.2348507
Full text: PDFPDF

We investigate the effect of rewarding terms according to their locations in documents for probabilistic information retrieval. The intuition behind our approach is that a large amount of authors would summarize their ideas in some particular parts of ...
expand
Scheduling queries across replicas
Ana Freire, Craig Macdonald, Nicola Tonellotto, Iadh Ounis, Fidel Cacheda
Pages: 1139-1140
doi>10.1145/2348283.2348508
Full text: PDFPDF

For increased efficiency, an information retrieval system can split its index into multiple shards, and then replicate these shards across many query servers. For each new query, an appropriate replica for each shard must be selected, such that the query ...
expand
Re-examining search result snippet examination time for relevance estimation
Dmitry Lagun, Eugene Agichtein
Pages: 1141-1142
doi>10.1145/2348283.2348509
Full text: PDFPDF

Previous studies of web search result examination have provided valuable insights in understanding and modelling searcher behavior. Yet, recent work (e.g., [3]) has been developed based on the assumption that the time a searcher spends examining a particular ...
expand
Sentiment identification by incorporating syntax, semantics and context information
Kunpeng Zhang, Yusheng Xie, Yu Cheng, Daniel Honbo, Doug Downey, Ankit Agrawal, Wei-keng Liao, Alok Choudhary
Pages: 1143-1144
doi>10.1145/2348283.2348510
Full text: PDFPDF

This paper proposes a method based on conditional random fields to incorporate sentence structure (syntax and semantics) and context information to identify sentiments of sentences within a document. It also proposes and evaluates two different active ...
expand
Short text classification using very few words
Aixin Sun
Pages: 1145-1146
doi>10.1145/2348283.2348511
Full text: PDFPDF

We propose a simple, scalable, and non-parametric approach for short text classification. Leveraging the well studied and scalable Information Retrieval (IR) framework, our approach mimics human labeling process for a piece of short text. It first selects ...
expand
Summarizing the differences from microblogs
Dingding Wang, Mitsunori Ogihara, Tao Li
Pages: 1147-1148
doi>10.1145/2348283.2348512
Full text: PDFPDF

With the rapid growth of social media websites, microblogging has become a popular way to spread instant news and events. Due to the dynamic and social nature of microblogs, extracting useful information from microblogs is more challenging than from ...
expand
Survival analysis of click logs
Si-Chi Chin, W. Nick Street
Pages: 1149-1150
doi>10.1145/2348283.2348513
Full text: PDFPDF

Click logs from search engines provide a rich opportunity to acquire implicit feedback from users. Patterns derived from the time between a posted query and a click provide information on the ranking quality, reflecting the perceived relevance of a retrieved ...
expand
Text selections as implicit relevance feedback
Ryen W. White, Georg Buscher
Pages: 1151-1152
doi>10.1145/2348283.2348514
Full text: PDFPDF

Users' search activity has been used as implicit feedback to model search interests and improve the performance of search systems. In search engines, this behavior usually takes the form of queries and result clicks. However, richer data on how people ...
expand
Time to judge relevance as an indicator of assessor error
Mark D. Smucker, Chandra Prakash Jethani
Pages: 1153-1154
doi>10.1145/2348283.2348515
Full text: PDFPDF

When human assessors judge documents for their relevance to a search topic, it is possible for errors in judging to occur. As part of the analysis of the data collected from a 48 participant user study, we have discovered that when the participants made ...
expand
Towards alias detection without string similarity: an active learning based approach
Lili Jiang, Jianyong Wang, Ping Luo, Ning An, Min Wang
Pages: 1155-1156
doi>10.1145/2348283.2348516
Full text: PDFPDF

Entity aliases commonly exist and accurately detecting these aliases plays a vital role in various applications. In this paper, we use an active-learning-based method to detect aliases without string similarity. To minimize the cost on pairwise comparison, ...
expand
Towards zero-click mobile IR evaluation: knowing what and knowing when
Tetsuya Sakai
Pages: 1157-1158
doi>10.1145/2348283.2348517
Full text: PDFPDF

In this poster, we propose two evaluation tasks for mobile information access. The first task evaluates the system's ability to guess what the user's query should be given a context ("Knowing What"). The second task evaluates the system's ability to ...
expand
Twanchor text: a preliminary study of the value of tweets as anchor text
Gilad Mishne, Jimmy Lin
Pages: 1159-1160
doi>10.1145/2348283.2348518
Full text: PDFPDF

It is well known that anchor text plays an important role in search, providing signals that are often not present in the source document itself. The paper reports results of a preliminary investigation on the value of tweets and tweet conversations as ...
expand
Unsupervised linear score normalization revisited
Ilya Markov, Avi Arampatzis, Fabio Crestani
Pages: 1161-1162
doi>10.1145/2348283.2348519
Full text: PDFPDF

We give a fresh look into score normalization for merging result-lists, isolating the problem from other components. We focus on three of the simplest, practical, and widely-used linear methods which do not require any training data, i.e. MinMax, Sum, ...
expand
User-aware caching and prefetching query results in web search engines
Hongyuan Ma, Bin Wang
Pages: 1163-1164
doi>10.1145/2348283.2348520
Full text: PDFPDF

Query results caching is an efficient technique for Web search engines. In this paper we present User-Aware Cache, a novel approach tailored for query results caching, that is based on user characteristics. We then use a trace of around 30 million queries ...
expand
Using eye-tracking with dynamic areas of interest for analyzing interactive information retrieval
Vu Tuan Tran, Norbert Fuhr
Pages: 1165-1166
doi>10.1145/2348283.2348521
Full text: PDFPDF

Based on a new framework for capturing dynamic areas of interest in eye-tracking, we model the user search process as a Markov-chain. The analysis indicates possible system improvements and yields parameter estimates for the Interactive Probability Ranking ...
expand
Using PageRank to infer user preferences
Praveen Chandar, Ben Carterette
Pages: 1167-1168
doi>10.1145/2348283.2348522
Full text: PDFPDF

Recently, researchers have shown interest in the use of preference judgments for evaluation in IR literature. Although preference judgments have several advantages over absolute judgment, one of the major disadvantages is that the number of judgments ...
expand
Utilizing inter-document similarities in federated search
Savva Khalaman, Oren Kurland
Pages: 1169-1170
doi>10.1145/2348283.2348523
Full text: PDFPDF

We demonstrate the merits of using inter-document similarities for federated search. Specifically, we study a results merging method that utilizes information induced from clusters of similar documents created across the lists retrieved from the ...
expand
Want a coffee?: predicting users' trails
Wen Li, Carsten Eickhoff, Arjen P. de Vries
Pages: 1171-1172
doi>10.1145/2348283.2348524
Full text: PDFPDF

Twitter and Foursquare are two well-connected platforms for sharing information where growing numbers of users post location-related messages. In contrast to the longitude-latitude geotags commonly used online, e.g., on photos and tweets, new place-tags ...
expand
Will this #hashtag be popular tomorrow?
Zongyang Ma, Aixin Sun, Gao Cong
Pages: 1173-1174
doi>10.1145/2348283.2348525
Full text: PDFPDF

Hashtags are widely used in Twitter to define a shared context for events or topics. In this paper, we aim to predict hashtag popularity in near future (i.e., next day). Given a hashtag that has the potential to be popular in the next day, we construct ...
expand
$100,000 prize jackpot. call now!: identifying the pertinent features of SMS spam
Henry Tan, Nazli Goharian, Micah Sherr
Pages: 1175-1176
doi>10.1145/2348283.2348526
Full text: PDFPDF

Mobile SMS spam is on the rise and is a prevalent problem. While recent work has shown that simple machine learning techniques can distinguish between ham and spam with high accuracy, this paper explores the individual contributions of various textual ...
expand
TUTORIAL SESSION: Tutorial presentations
Beyond bag-of-words: machine learning for query-document matching in web search
Hang Li, Jun Xu
Pages: 1177-1177
doi>10.1145/2348283.2348528
Full text: PDFPDF
Methods for mining and summarizing text conversations
Giuseppe Carenini, Gabrial Murray
Pages: 1178-1179
doi>10.1145/2348283.2348529
Full text: PDFPDF

More and more today, people are engaging in conversations via email, blogs, discussion forums, text messaging and other social media. A person may want to archive these conversations and later retrieve information about what was discussed, or analyze ...
expand
Crowdsourcing for search evaluation and social-algorithmic search
Matthew Lease, Omar Alonso
Pages: 1180-1180
doi>10.1145/2348283.2348530
Full text: PDFPDF

The first computers were people. Today, Internet-based access to 24/7 online human crowds has led to a renaissance of research in human computation and the advent of crowdsourcing. These new opportunities have brought a disruptive shift to research and ...
expand
(Big) usage data in web search
Ricardo Baeza-Yates, Yoelle Maarek
Pages: 1181-1182
doi>10.1145/2348283.2348531
Full text: PDFPDF
A new look at old tricks: the fertile roots of current research
Paul Kantor
Pages: 1183-1183
doi>10.1145/2348283.2348532
Full text: PDFPDF
Aspect-based opinion mining from product reviews
Samaneh Moghaddam, Martin Ester
Pages: 1184-1184
doi>10.1145/2348283.2348533
Full text: PDFPDF

"What other people think" has always been an important piece of information for most of us during the decision-making process. Today people tend to make their opinions available to other people via the Internet. As a result, the Web has become an excellent ...
expand
Experimental methods for information retrieval
Donald Metzler, Oren Kurland
Pages: 1185-1186
doi>10.1145/2348283.2348534
Full text: PDFPDF
IR models: foundations and relationships
Thomas Roelleke
Pages: 1187-1188
doi>10.1145/2348283.2348535
Full text: PDFPDF

In IR research it is essential to know IR models. Research over the past years has consolidated the foundations of IR models. Moreover, relationships have been reported that help to use and position IR models. Knowing about the foundations and relationships ...
expand
Patent information retrieval: an instance of domain-specific search
Mihai Lupu
Pages: 1189-1190
doi>10.1145/2348283.2348536
Full text: PDFPDF

The tutorial aims to provide the IR researchers with an understanding of how the patent system works, the challenges that patent searchers face in using the existing tools and in adopting new methods developed in academia. At the same time, the tutorial ...
expand
Medical information retrieval: an instance of domain-specific search
Allan Hanbury
Pages: 1191-1192
doi>10.1145/2348283.2348537
Full text: PDFPDF

Due to an explosion in the amount of medical information available, search techniques are gaining importance in the medical domain. This tutorial discusses recent results on search in the medical domain, including the outcome of surveys on end user requirements, ...
expand
Visual information retrieval using Java and LIRE
Oge Marques, Mathias Lux
Pages: 1193-1193
doi>10.1145/2348283.2348538
Full text: PDFPDF

Visual information retrieval (VIR) is an active and vibrant research area, which attempts at providing means for organizing, indexing, annotating, and retrieving visual information (images and videos) form large, unstructured repositories. The goal of ...
expand
Large-scale graph mining and learning for information retrieval
Bin Gao, Taifeng Wang, Tie-Yan Liu
Pages: 1194-1195
doi>10.1145/2348283.2348539
Full text: PDFPDF

For many information retrieval applications, we need to deal with the ranking problem on very large scale graphs. However, it is non-trivial to perform efficient and effective ranking on them. On one aspect, we need to design scalable algorithms. On ...
expand
Query performance prediction for IR
David Carmel, Oren Kurland
Pages: 1196-1197
doi>10.1145/2348283.2348540
Full text: PDFPDF

The goal of this tutorial is to expose participants to current research on query performance prediction. Participants will become familiar with state-of-the-art performance prediction methods, with common evaluation methodologies of prediction quality, ...
expand
Collaborative information seeking: art and science of achieving 1+1>2 in IR
Chirag Shah
Pages: 1198-1199
doi>10.1145/2348283.2348541
Full text: PDFPDF

The assumption of information seekers being independent and IR problem being individual has been challenged often in the recent past, with an argument that the next big leap in search and retrieval will come through incorporating social and collaborative ...
expand
Advances on the development of evaluation measures
Ben Carterette, Evangelos Kanoulas, Emine Yilmaz
Pages: 1200-1201
doi>10.1145/2348283.2348542
Full text: PDFPDF

The goal of the tutorial is to provide attendees with a comprehensive overview of the latest advances in the development of information retrieval evaluation measures and discuss the current challenges in the area. A number of topics are covered, including ...
expand

Powered by The ACM Guide to Computing Literature


Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Table of Contents
SESSION: Keynote address 1
Future of the web and search
Qi Lu
Pages: 1-2
doi>10.1145/2009916.2009918
Full text: PDFPDF

No one doubts that we have only scratched the surface of what is possible with the Web. The day is coming fast when the Web will become almost a virtual mind reader. Your intent, interests, and needs will be instantly perceived and the information you ...
expand
SESSION: Keynote address 2
Beyond search: statistical topic models for text analysis
ChengXiang Zhai
Pages: 3-4
doi>10.1145/2009916.2009920
Full text: PDFPDF

Search is generally a means to the end of finishing a task. While the current search engines are useful to users for finding relevant information, they offer little help to users for further digesting and analyzing the overwhelming found information ...
expand
SESSION: Users 1
Modeling and analysis of cross-session search tasks
Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, Jaime Teevan
Pages: 5-14
doi>10.1145/2009916.2009922
Full text: PDFPDF

The information needs of search engine users vary in complexity, depending on the task they are trying to accomplish. Some simple needs can be satisfied with a single query, whereas others require a series of queries issued over a longer period of time. ...
expand
The economics in interactive information retrieval
Leif Azzopardi
Pages: 15-24
doi>10.1145/2009916.2009923
Full text: PDFPDF

Searching is inherently an interactive process usually requiring numerous iterations of querying and assessing in order to find the desired amount of relevant information. Essentially, the search process can be viewed as a combination of inputs (queries ...
expand
Seeding simulated queries with user-study data for personal search evaluation
David Elsweiler, David E. Losada, José C. Toucedo, Ronald T. Fernandez
Pages: 25-34
doi>10.1145/2009916.2009924
Full text: PDFPDF

In this paper we perform a lab-based user study (n=21) of email re-finding behaviour, examining how the characteristics of submitted queries change in different situations. A number of logistic regression models are developed on the query data to explore ...
expand
Understanding re-finding behavior in naturalistic email interaction logs
David Elsweiler, Morgan Harvey, Martin Hacker
Pages: 35-44
doi>10.1145/2009916.2009925
Full text: PDFPDF

In this paper we present a longitudinal, naturalistic study of email behavior (n=47) and describe our efforts at isolating re-finding behavior in the logs through various qualitative and quantitative analyses. The presented work underlines the methodological ...
expand
SESSION: Query analysis I
People searching for people: analysis of a people search engine log
Wouter Weerkamp, Richard Berendsen, Bogomil Kovachev, Edgar Meij, Krisztian Balog, Maarten de Rijke
Pages: 45-54
doi>10.1145/2009916.2009927
Full text: PDFPDF

Recent years show an increasing interest in vertical search: searching within a particular type of information. Understanding what people search for in these "verticals" gives direction to research and provides pointers for the search engines themselves. ...
expand
Learning search tasks in queries and web pages via graph regularization
Ming Ji, Jun Yan, Siyu Gu, Jiawei Han, Xiaofei He, Wei Vivian Zhang, Zheng Chen
Pages: 55-64
doi>10.1145/2009916.2009928
Full text: PDFPDF

As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. ...
expand
Intentions and attention in exploratory health search
Marc-Allen Cartright, Ryen W. White, Eric Horvitz
Pages: 65-74
doi>10.1145/2009916.2009929
Full text: PDFPDF

We study information goals and patterns of attention in explorato-ry search for health information on the Web, reporting results of a large-scale log-based study. We examine search activity associated with the goal of diagnosing illness from symptoms ...
expand
User behavior in zero-recall ecommerce queries
Gyanit Singh, Nish Parikh, Neel Sundaresn
Pages: 75-84
doi>10.1145/2009916.2009930
Full text: PDFPDF

User expectation and experience for web search and eCommerce (product) search are quite different. Product descriptions are concise as compared to typical web documents. User expectation is more specific to find the right product. The difference in the ...
expand
SESSION: Learning to rank
Bagging gradient-boosted trees for high precision, low variance ranking models
Yasser Ganjisaffar, Rich Caruana, Cristina Videira Lopes
Pages: 85-94
doi>10.1145/2009916.2009932
Full text: PDFPDF

Recent studies have shown that boosting provides excellent predictive performance across a wide variety of tasks. In Learning-to-rank, boosted models such as RankBoost and LambdaMART have been shown to be among the best performing learning methods based ...
expand
Learning to rank for freshness and relevance
Na Dai, Milad Shokouhi, Brian D. Davison
Pages: 95-104
doi>10.1145/2009916.2009933
Full text: PDFPDF

Freshness of results is important in modern web search. Failing to recognize the temporal aspect of a query can negatively affect the user experience, and make the search engine appear stale. While freshness and relevance can be closely related for some ...
expand
A cascade ranking model for efficient ranked retrieval
Lidan Wang, Jimmy Lin, Donald Metzler
Pages: 105-114
doi>10.1145/2009916.2009934
Full text: PDFPDF

There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to ...
expand
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation
Peng Cai, Wei Gao, Aoying Zhou, Kam-Fai Wong
Pages: 115-124
doi>10.1145/2009916.2009935
Full text: PDFPDF

Learning to adapt in a new setting is a common challenge to our knowledge and capability. New life would be easier if we actively pursued supervision from the right mentor chosen with our relevant but limited prior knowledge. This variant principle of ...
expand
SESSION: Personalization
SCENE: a scalable two-stage personalized news recommendation system
Lei Li, Dingding Wang, Tao Li, Daniel Knox, Balaji Padmanabhan
Pages: 125-134
doi>10.1145/2009916.2009937
Full text: PDFPDF

Recommending news articles has become a promising research direction as the Internet provides fast access to real-time information from multiple sources around the world. Traditional news recommendation systems strive to adapt their services to individual ...
expand
Inferring and using location metadata to personalize web search
Paul N. Bennett, Filip Radlinski, Ryen W. White, Emine Yilmaz
Pages: 135-144
doi>10.1145/2009916.2009938
Full text: PDFPDF

Personalization of search results offers the potential for significant improvements in Web search. Among the many observable user attributes, approximate user location is particularly simple for search engines to obtain and allows personalization even ...
expand
Active learning to maximize accuracy vs. effort in interactive information retrieval
Aibo Tian, Matthew Lease
Pages: 145-154
doi>10.1145/2009916.2009939
Full text: PDFPDF

We consider an interactive information retrieval task in which the user is interested in finding several to many relevant documents with minimal effort. Given an initial document ranking, user interaction with the system produces relevance feedback (RF) ...
expand
SESSION: Retrieval models I
CRTER: using cross terms to enhance probabilistic information retrieval
Jiashu Zhao, Jimmy Xiangji Huang, Ben He
Pages: 155-164
doi>10.1145/2009916.2009941
Full text: PDFPDF

Term proximity retrieval rewards a document where the matched query terms occur close to each other. Although term proximity is known to be effective in many Information Retrieval (IR) applications, the within-document distribution of each individual ...
expand
A boosting approach to improving pseudo-relevance feedback
Yuanhua Lv, ChengXiang Zhai, Wan Chen
Pages: 165-174
doi>10.1145/2009916.2009942
Full text: PDFPDF

Pseudo-relevance feedback has proven effective for improving the average retrieval performance. Unfortunately, many experiments have shown that although pseudo-relevance feedback helps many queries, it also often hurts many other queries, limiting its ...
expand
Enhancing ad-hoc relevance weighting using probability density estimation
Xiaofeng Zhou, Jimmy Xiangji Huang, Ben He
Pages: 175-184
doi>10.1145/2009916.2009943
Full text: PDFPDF

Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document's relevance of its length, and the Scope hypothesis, which ...
expand
SESSION: Social media
Who should share what?: item-level social influence prediction for users and posts ranking
Peng Cui, Fei Wang, Shaowei Liu, Mingdong Ou, Shiqiang Yang, Lifeng Sun
Pages: 185-194
doi>10.1145/2009916.2009945
Full text: PDFPDF

People and information are two core dimensions in a social network. People sharing information (such as blogs, news, albums, etc.) is the basic behavior. In this paper, we focus on predicting item-level social influence to answer the question Who should ...
expand
Mining tags using social endorsement networks
Theodoros Lappas, Kunal Punera, Tamas Sarlos
Pages: 195-204
doi>10.1145/2009916.2009946
Full text: PDFPDF

Entities on social systems, such as users on Twitter, and images on Flickr, are at the core of many interesting applications: they can be ranked in search results, recommended to users, or used in contextual advertising. Such applications assume knowledge ...
expand
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Gabriella Kazai, Jaap Kamps, Marijn Koolen, Natasa Milic-Frayling
Pages: 205-214
doi>10.1145/2009916.2009947
Full text: PDFPDF

The evaluation of information retrieval (IR) systems over special collections, such as large book repositories, is out of reach of traditional methods that rely upon editorial relevance judgments. Increasingly, the use of crowdsourcing to collect relevance ...
expand
SESSION: Content analysis
A site oriented method for segmenting web pages
David Fernandes, Edleno Silva de Moura, Altigran Soares da Silva, Berthier Ribeiro-Neto, Edisson Braga
Pages: 215-224
doi>10.1145/2009916.2009949
Full text: PDFPDF

Information about how to segment a Web page can be used nowadays by applications such as segment aware Web search, classification and link analysis. In this research, we propose a fully automatic method for page segmentation and evaluate its application ...
expand
Composite hashing with multiple information sources
Dan Zhang, Fei Wang, Luo Si
Pages: 225-234
doi>10.1145/2009916.2009950
Full text: PDFPDF

Similarity search applications with a large amount of text and image data demands an efficient and effective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted ...
expand
Detecting outlier sections in us congressional legislation
Elif Aktolga, Irene Ros, Yannick Assogba
Pages: 235-244
doi>10.1145/2009916.2009951
Full text: PDFPDF

Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse ...
expand
DOM based content extraction via text density
Fei Sun, Dandan Song, Lejian Liao
Pages: 245-254
doi>10.1145/2009916.2009952
Full text: PDFPDF

In addition to the main content, most web pages also contain navigation panels, advertisements and copyright and disclaimer notices. This additional content, which is also known as noise, is typically not related to the main subject and may hamper the ...
expand
SESSION: Web IR
Social context summarization
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, Juanzi Li
Pages: 255-264
doi>10.1145/2009916.2009954
Full text: PDFPDF

We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard documents. With the rapid growth of online social networks, abundant user generated ...
expand
Probabilistic factor models for web site recommendation
Hao Ma, Chao Liu, Irwin King, Michael R. Lyu
Pages: 265-274
doi>10.1145/2009916.2009955
Full text: PDFPDF

Due to the prevalence of personalization and information filtering applications, modeling users' interests on the Web has become increasingly important during the past few years. In this paper, aiming at providing accurate personalized Web site recommendations ...
expand
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation
Jing He, Wayne Xin Zhao, Baihan Shu, Xiaoming Li, Hongfei Yan
Pages: 275-284
doi>10.1145/2009916.2009956
Full text: PDFPDF

Various click models have been recently proposed as a principled approach to infer the relevance of documents from the clickthrough data. The inferred document relevance is potentially useful in evaluating the Web retrieval systems. In practice, it generally ...
expand
Unsupervised query segmentation using clickthrough for information retrieval
Yanen Li, Bo-Jun Paul Hsu, ChengXiang Zhai, Kuansan Wang
Pages: 285-294
doi>10.1145/2009916.2009957
Full text: PDFPDF

Query segmentation is an important task toward understanding queries accurately, which is essential for improving search results. Existing segmentation models either use labeled data to predict the segmentation boundaries, for which the training data ...
expand
SESSION: Collaborative filtering I
Collaborative competitive filtering: learning recommender using context of user choice
Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, Zhaohui Zheng
Pages: 295-304
doi>10.1145/2009916.2009959
Full text: PDFPDF

While a user's preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) ...
expand
CLR: a collaborative location recommendation framework based on co-clustering
Kenneth Wai-Ting Leung, Dik Lun Lee, Wang-Chien Lee
Pages: 305-314
doi>10.1145/2009916.2009960
Full text: PDFPDF

GPS data tracked on mobile devices contains rich information about human activities and preferences. In this paper, GPS data is used in location-based services (LBSs) to provide collaborative location recommendations. We observe that most existing LBSs ...
expand
Functional matrix factorizations for cold-start recommendation
Ke Zhou, Shuang-Hong Yang, Hongyuan Zha
Pages: 315-324
doi>10.1145/2009916.2009961
Full text: PDFPDF

A key challenge in recommender system research is how to effectively profile new users, a problem generally known as cold-start recommendation. Recently the idea of progressively querying user responses through an initial interview ...
expand
Exploiting geographical influence for collaborative point-of-interest recommendation
Mao Ye, Peifeng Yin, Wang-Chien Lee, Dik-Lun Lee
Pages: 325-334
doi>10.1145/2009916.2009962
Full text: PDFPDF

In this paper, we aim to provide a point-of-interests (POI) recommendation service for the rapid growing location-based social networks (LBSNs), e.g., Foursquare, Whrrl, etc. Our idea is to explore user preference, social influence and geographical influence ...
expand
SESSION: Users II
Why searchers switch: understanding and predicting engine switching rationales
Qi Guo, Ryen W. White, Yunqiao Zhang, Blake Anderson, Susan T. Dumais
Pages: 335-344
doi>10.1145/2009916.2009964
Full text: PDFPDF

Search engine switching is the voluntary transition between Web search engines. Engine switching can occur for a number of reasons, including user dissatisfaction with search results, a desire for broader topic coverage or verification, user preferences, ...
expand
Find it if you can: a game for modeling different types of web search success using interaction data
Mikhail Ageev, Qi Guo, Dmitry Lagun, Eugene Agichtein
Pages: 345-354
doi>10.1145/2009916.2009965
Full text: PDFPDF

A better understanding of strategies and behavior of successful searchers is crucial for improving the experience of all searchers. However, research of search behavior has been struggling with the tension between the relatively small-scale, but controlled ...
expand
Measuring improvement in user search performance resulting from optimal search tips
Neema Moraveji, Daniel Russell, Jacob Bien, David Mease
Pages: 355-364
doi>10.1145/2009916.2009966
Full text: PDFPDF

Web search performance can be improved by either improving the search engine itself or by educating the user to search more efficiently. There is a large amount of literature describing techniques for measuring the former; whereas, improvements resulting ...
expand
ViewSer: enabling large-scale remote user studies of web search examination and interaction
Dmitry Lagun, Eugene Agichtein
Pages: 365-374
doi>10.1145/2009916.2009967
Full text: PDFPDF

Web search behaviour studies, including eye-tracking studies of search result examination, have resulted in numerous insights to improve search result quality and presentation. Yet, eye tracking studies have been restricted in scale, due to the expense ...
expand
SESSION: Query analysis II
CrowdLogging: distributed, private, and anonymous search logging
Henry Allen Feild, James Allan, Joshua Glatt
Pages: 375-384
doi>10.1145/2009916.2009969
Full text: PDFPDF

We describe CrowdLogging, an approach for distributed search log collection, storage, and mining, with the dual goals of preserving privacy and making the mined information broadly available. Most search log mining approaches and most privacy enhancing ...
expand
Out of sight, not out of mind: on the effect of social and physical detachment on information need
Elad Yom-Tov, Fernando Diaz
Pages: 385-394
doi>10.1145/2009916.2009970
Full text: PDFPDF

The information needs of users and the documents which answer it are frequently contingent on the different characteristics of users. This is especially evident during natural disasters, such as earthquakes and violent weather incidents, which create ...
expand
Scalable multi-dimensional user intent identification using tree structured distributions
Vinay Jethava, Liliana Calderón-Benavides, Ricardo Baeza-Yates, Chiranjib Bhattacharyya, Devdatt Dubhashi
Pages: 395-404
doi>10.1145/2009916.2009971
Full text: PDFPDF

The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often ...
expand
Social annotation in query expansion: a machine learning approach
Yuan Lin, Hongfei Lin, Song Jin, Zheng Ye
Pages: 405-414
doi>10.1145/2009916.2009972
Full text: PDFPDF

Automatic query expansion technologies have been proven to be effective in many information retrieval tasks. Most existing approaches are based on the assumption that the most informative terms in top-retrieved documents can be viewed as context of the ...
expand
SESSION: Communities
Predicting web searcher satisfaction with existing community-based answers
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, Idan Szpektor
Pages: 415-424
doi>10.1145/2009916.2009974
Full text: PDFPDF

Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 billion, ...
expand
Competition-based user expertise score estimation
Jing Liu, Young-In Song, Chin-Yew Lin
Pages: 425-434
doi>10.1145/2009916.2009975
Full text: PDFPDF

In this paper, we consider the problem of estimating the relative expertise score of users in community question and answering services (CQA). Previous approaches typically only utilize the explicit question answering relationship between askers and ...
expand
Learning online discussion structures by conditional random fields
Hongning Wang, Chi Wang, ChengXiang Zhai, Jiawei Han
Pages: 435-444
doi>10.1145/2009916.2009976
Full text: PDFPDF

Online forum discussions are emerging as valuable information repository, where knowledge is accumulated by the interaction among users, leading to multiple threads with structures. Such replying structure in each thread conveys important information ...
expand
Mining topics on participations for community discovery
Guoqing Zheng, Jinwen Guo, Lichun Yang, Shengliang Xu, Shenghua Bao, Zhong Su, Dingyi Han, Yong Yu
Pages: 445-454
doi>10.1145/2009916.2009977
Full text: PDFPDF

Community discovery on large-scale linked document corpora has been a hot research topic for decades. There are two types of links. The first one, which we call d2d-link, indicates connectiveness among different documents, such as blog references and ...
expand
SESSION: Classification
Authorship classification: a discriminative syntactic tree mining approach
Sangkyum Kim, Hyungsul Kim, Tim Weninger, Jiawei Han, Hyun Duk Kim
Pages: 455-464
doi>10.1145/2009916.2009979
Full text: PDFPDF

In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, ...
expand
On theme location discovery for travelogue services
Mao Ye, Rong Xiao, Wang-Chien Lee, Xing Xie
Pages: 465-474
doi>10.1145/2009916.2009980
Full text: PDFPDF

In this paper, we aim to develop a travelogue service that discovers and conveys various travelogue digests, in form of theme locations, geographical scope, traveling trajectory and location snippet, to users. In this service, theme locations in a travelogue ...
expand
Effective sentiment stream analysis with self-augmenting training and demand-driven projection
Ismael Santana Silva, Janaína Gomide, Adriano Veloso, Wagner Meira, Jr., Renato Ferreira
Pages: 475-484
doi>10.1145/2009916.2009981
Full text: PDFPDF

How do we analyze sentiments over a set of opinionated Twitter messages? This issue has been widely studied in recent years, with a prominent approach being based on the application of classification techniques. Basically, messages are classified according ...
expand
SESSION: Retrieval models II
Hypergeometric language models for republished article finding
Manos Tsagkias, Maarten de Rijke, Wouter Weerkamp
Pages: 485-494
doi>10.1145/2009916.2009983
Full text: PDFPDF

Republished article finding is the task of identifying instances of articles that have been published in one source and republished more or less verbatim in another source, which is often a social media source. We address this task as an ad hoc retrieval ...
expand
Estimation methods for ranking recent information
Miles Efron, Gene Golovchinsky
Pages: 495-504
doi>10.1145/2009916.2009984
Full text: PDFPDF

Temporal aspects of documents can impact relevance for certain kinds of queries. In this paper, we build on earlier work of modeling temporal information. We propose an extension to the Query Likelihood Model that incorporates query-specific information ...
expand
Query by document via a decomposition-based two-level retrieval approach
Linkai Weng, Zhiwei Li, Rui Cai, Yaoxue Zhang, Yuezhi Zhou, Laurence T. Yang, Lei Zhang
Pages: 505-514
doi>10.1145/2009916.2009985
Full text: PDFPDF

Retrieving similar documents from a large-scale text corpus according to a given document is a fundamental technique for many applications. However, most of existing indexing techniques have difficulties to address this problem due to special properties ...
expand
SESSION: Image search
Integrating hierarchical feature selection and classifier training for multi-label image annotation
Cheng Jin, Chunlei Yang
Pages: 515-524
doi>10.1145/2009916.2009987
Full text: PDFPDF

It is well accepted that using high-dimensional multi-modal visual features for image content representation and classifier training may achieve more sufficient characterization of the diverse visual properties of the images and further result in higher ...
expand
Efficient manifold ranking for image retrieval
Bin Xu, Jiajun Bu, Chun Chen, Deng Cai, Xiaofei He, Wei Liu, Jiebo Luo
Pages: 525-534
doi>10.1145/2009916.2009988
Full text: PDFPDF

Manifold Ranking (MR), a graph-based ranking algorithm, has been widely applied in information retrieval and shown to have excellent performance and feasibility on a variety of data types. Particularly, it has been successfully applied to content-based ...
expand
Mining weakly labeled web facial images for search-based face annotation
Dayong Wang, Steven C.H. Hoi, Ying He
Pages: 535-544
doi>10.1145/2009916.2009989
Full text: PDFPDF

In this paper, we investigate a search-based face annotation framework by mining weakly labeled facial images that are freely available on the internet. A key component of such a search-based annotation paradigm is to build a database of facial images ...
expand
SESSION: Indexing
Temporal index sharding for space-time efficiency in archive search
Avishek Anand, Srikanta Bedathur, Klaus Berberich, Ralf Schenkel
Pages: 545-554
doi>10.1145/2009916.2009991
Full text: PDFPDF

Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the web archives or wikis. Typical approaches for efficient evaluation of these queries involve slicing ...
expand
Inverted indexes for phrases and strings
Manish Patil, Sharma V. Thankachan, Rahul Shah, Wing-Kai Hon, Jeffrey Scott Vitter, Sabrina Chandrasekaran
Pages: 555-564
doi>10.1145/2009916.2009992
Full text: PDFPDF

Inverted indexes are the most fundamental and widely used data structures in information retrieval. For each unique word occurring in a document collection, the inverted index stores a list of the documents in which this word occurs. Compression techniques ...
expand
Faster temporal range queries over versioned text
Jinru He, Torsten Suel
Pages: 565-574
doi>10.1145/2009916.2009993
Full text: PDFPDF

Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collection of the Internet Archive. Search queries over such collections often ...
expand
Indexing strategies for graceful degradation of search quality
Shuai Ding, Sreenivas Gollapudi, Samuel Ieong, Krishnaram Kenthapadi, Alexandros Ntoulas
Pages: 575-584
doi>10.1145/2009916.2009994
Full text: PDFPDF

Large web search engines process billions of queries each day over tens of billions of documents with often very stringent requirements for a user's search experience, in particular, low latency and highly relevant search results. Index generation and ...
expand
SESSION: Web queries
Incremental diversification for very large sets: a streaming-based approach
Enrico Minack, Wolf Siberski, Wolfgang Nejdl
Pages: 585-594
doi>10.1145/2009916.2009996
Full text: PDFPDF

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set ...
expand
Intent-aware search result diversification
Rodrygo L.T. Santos, Craig Macdonald, Iadh Ounis
Pages: 595-604
doi>10.1145/2009916.2009997
Full text: PDFPDF

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved ...
expand
Parameterized concept weighting in verbose queries
Michael Bendersky, Donald Metzler, W. Bruce Croft
Pages: 605-614
doi>10.1145/2009916.2009998
Full text: PDFPDF

The majority of the current information retrieval models weight the query concepts (e.g., terms or phrases) in an unsupervised manner, based solely on the collection statistics. In this paper, we go beyond the unsupervised estimation of concept weights, ...
expand
UPS: efficient privacy protection in personalized web search
Gang Chen, He Bai, Lidan Shou, Ke Chen, Yunjun Gao
Pages: 615-624
doi>10.1145/2009916.2009999
Full text: PDFPDF

In recent years, personalized web search (PWS) has demonstrated effectiveness in improving the quality of search service on the Internet. Unfortunately, the need for collecting private information in PWS has become a major barrier for its wide proliferation. ...
expand
SESSION: Collaborative filtering II
Handling data sparsity in collaborative filtering using emotion and semantic based features
Yashar Moshfeghi, Benjamin Piwowarski, Joemon M. Jose
Pages: 625-634
doi>10.1145/2009916.2010001
Full text: PDFPDF

Collaborative filtering (CF) aims to recommend items based on prior user interaction. Despite their success, CF techniques do not handle data sparsity well, especially in the case of the cold start problem where there is no past rating for an item. In ...
expand
Fast context-aware recommendations with factorization machines
Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, Lars Schmidt-Thieme
Pages: 635-644
doi>10.1145/2009916.2010002
Full text: PDFPDF

The situation in which a choice is made is an important information for recommender systems. Context-aware recommenders take this information into account to make predictions. So far, the best performing method for context-aware rating prediction in ...
expand
Filtering semi-structured documents based on faceted feedback
Lanbo Zhang, Yi Zhang, Qianli Xing
Pages: 645-654
doi>10.1145/2009916.2010003
Full text: PDFPDF

Existing adaptive filtering systems learn user profiles based on users' relevance judgments on documents. In some cases, users have some prior knowledge about what features are important for a document to be relevant. For example, a Spanish speaker may ...
expand
Learning relevance from heterogeneous social network and its application in online targeting
Chi Wang, Rajat Raina, David Fong, Ding Zhou, Jiawei Han, Greg Badros
Pages: 655-664
doi>10.1145/2009916.2010004
Full text: PDFPDF

The rise of social networking services in recent years presents new research challenges for matching users with interesting content. While the content-rich nature of these social networks offers many cues on "interests" of a user such as text in user-generated ...
expand
SESSION: Latent semantic analysis
ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews
Samaneh Moghaddam, Martin Ester
Pages: 665-674
doi>10.1145/2009916.2010006
Full text: PDFPDF

Today, more and more product reviews become available on the Internet, e.g., product review forums, discussion groups, and Blogs. However, it is almost impossible for a customer to read all of the different and possibly even contradictory opinions and ...
expand
Clickthrough-based latent semantic models for web search
Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
Pages: 675-684
doi>10.1145/2009916.2010007
Full text: PDFPDF

This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the ...
expand
Regularized latent semantic indexing
Quan Wang, Jun Xu, Hang Li, Nick Craswell
Pages: 685-694
doi>10.1145/2009916.2010008
Full text: PDFPDF

Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require ...
expand
SESSION: Multimedia IR
Multimedia answering: enriching text QA with media information
Liqiang Nie, Meng Wang, Zhengjun Zha, Guangda Li, Tat-Seng Chua
Pages: 695-704
doi>10.1145/2009916.2010010
Full text: PDFPDF

Existing community question-answering forums usually provide only textual answers. However, for many questions, pure texts cannot provide intuitive information, while image or video contents are more appropriate. In this paper, we introduce a scheme ...
expand
Enhancing multi-label music genre classification through ensemble techniques
Chris Sanden, John Z. Zhang
Pages: 705-714
doi>10.1145/2009916.2010011
Full text: PDFPDF

In the field of Music Information Retrieval (MIR), multi-label genre classification is the problem of assigning one or more genre labels to a music piece. In this work, we propose a set of ensemble techniques, which are specific to the task of multi-label ...
expand
Picasso - to sing, you must close your eyes and draw
Aleksandar Stupar, Sebastian Michel
Pages: 715-724
doi>10.1145/2009916.2010012
Full text: PDFPDF

We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal ...
expand
SESSION: Summarization
Enhanced results for web search
Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco
Pages: 725-734
doi>10.1145/2009916.2010014
Full text: PDFPDF

"Ten blue links" have defined web search results for the last fifteen years -- snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia ...
expand
Summarizing the differences in multilingual news
Xiaojun Wan, Houping Jia, Shanshan Huang, Jianguo Xiao
Pages: 735-744
doi>10.1145/2009916.2010015
Full text: PDFPDF

There usually exist many news articles written in different languages about a hot news event. The news articles in different languages are written in different ways to reflect different standpoints. For example, the Chinese news agencies and the Western ...
expand
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution
Rui Yan, Xiaojun Wan, Jahna Otterbacher, Liang Kong, Xiaoming Li, Yan Zhang
Pages: 745-754
doi>10.1145/2009916.2010016
Full text: PDFPDF

Classic news summarization plays an important role with the exponential document growth on the Web. Many approaches are proposed to generate summaries but seldom simultaneously consider evolutionary characteristics of news plus to traditional summary ...
expand
SESSION: Vertical & entity search
Ranking related news predictions
Nattiya Kanhabua, Roi Blanco, Michael Matthews
Pages: 755-764
doi>10.1145/2009916.2010018
Full text: PDFPDF

We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this ...
expand
Collective entity linking in web text: a graph-based method
Xianpei Han, Le Sun, Jun Zhao
Pages: 765-774
doi>10.1145/2009916.2010019
Full text: PDFPDF

Entity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional ...
expand
From one tree to a forest: a unified solution for structured web data extraction
Qiang Hao, Rui Cai, Yanwei Pang, Lei Zhang
Pages: 775-784
doi>10.1145/2009916.2010020
Full text: PDFPDF

Structured data, in the form of entities and associated attributes, has been a rich web resource for search engines and knowledge databases. To efficiently extract structured data from enormous websites in various verticals (e.g., books, restaurants), ...
expand
Improving local search ranking through external logs
Klaus Berberich, Arnd Christian König, Dimitrios Lymberopoulos, Peixiang Zhao
Pages: 785-794
doi>10.1145/2009916.2010021
Full text: PDFPDF

The signals used for ranking in local search are very different from web search: in addition to (textual) relevance, measures of (geographic) distance between the user and the search result, as well as measures of popularity of the result are important ...
expand
SESSION: Query suggestions
Query suggestions in the absence of query logs
Sumit Bhatia, Debapriyo Majumdar, Prasenjit Mitra
Pages: 795-804
doi>10.1145/2009916.2010023
Full text: PDFPDF

After an end-user has partially input a query, intelligent search engines can suggest possible completions of the partial query to help end-users quickly express their information needs. All major web-search engines and most proposed methods that suggest ...
expand
Synthesizing high utility suggestions for rare web search queries
Alpa Jain, Umut Ozertem, Emre Velipasaoglu
Pages: 805-814
doi>10.1145/2009916.2010024
Full text: PDFPDF

Search engines are continuously looking into methods to alleviate users' effort in finding desired information. For this, all major search engines employ query suggestions methods to facilitate effective query formulation and reformulation. Providing ...
expand
Post-ranking query suggestion by diversifying search results
Yang Song, Dengyong Zhou, Li-wei He
Pages: 815-824
doi>10.1145/2009916.2010025
Full text: PDFPDF

Query suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...
expand
Automatic boolean query suggestion for professional search
Youngho Kim, Jangwon Seo, W. Bruce Croft
Pages: 825-834
doi>10.1145/2009916.2010026
Full text: PDFPDF

In professional search environments, such as patent search or legal search, search tasks have unique characteristics: 1) users interactively issue several queries for a topic, and 2) users are willing to examine many retrieval results, i.e., there is ...
expand
SESSION: Linguistic analysis
Improved video categorization from text metadata and user comments
Katja Filippova, Keith B. Hall
Pages: 835-842
doi>10.1145/2009916.2010028
Full text: PDFPDF

We consider the task of assigning categories (e.g., howto/cooking, sports/basketball, pet/dogs) to YouTube videos from video and text signals. We show that two complementary views on the data -- from the video and text perspectives -- complement each ...
expand
Multifaceted toponym recognition for streaming news
Michael D. Lieberman, Hanan Samet
Pages: 843-852
doi>10.1145/2009916.2010029
Full text: PDFPDF

News sources on the Web generate constant streams of information, describing many aspects of the events that shape our world. In particular, geography plays a key role in the news, and enabling geographic retrieval of news articles involves recognizing ...
expand
Enriching document representation via translation for improved monolingual information retrieval
Seung-Hoon Na, Hwee Tou Ng
Pages: 853-862
doi>10.1145/2009916.2010030
Full text: PDFPDF

Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language ...
expand
A novel corpus-based stemming algorithm using co-occurrence statistics
Jiaul H. Paik, Dipasree Pal, Swapan K. Parui
Pages: 863-872
doi>10.1145/2009916.2010031
Full text: PDFPDF

We present a stemming algorithm for text retrieval. The algorithm uses the statistics collected on the basis of certain corpus analysis based on the co-occurrence between two word variants. We use a very simple co-occurrence measure that reflects how ...
expand
SESSION: Clustering
Document clustering with universum
Dan Zhang, Jingdong Wang, Luo Si
Pages: 873-882
doi>10.1145/2009916.2010033
Full text: PDFPDF

Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. ...
expand
Identifying points of interest by self-tuning clustering
Yiyang Yang, Zhiguo Gong, Leong Hou U
Pages: 883-892
doi>10.1145/2009916.2010034
Full text: PDFPDF

Deducing trip related information from web-scale datasets has received very large amounts of attention recently. Identifying points of interest (POIs) in geo-tagged photos is one of these problems. The problem can be viewed as a standard clustering problem ...
expand
Cluster-based fusion of retrieved lists
Anna Khudyak Kozorovitsky, Oren Kurland
Pages: 893-902
doi>10.1145/2009916.2010035
Full text: PDFPDF

Methods for fusing document lists that were retrieved in response to a query often use retrieval scores (or ranks) of documents in the lists. We present a novel probabilistic fusion approach that utilizes an additional source of rich information, namely, ...
expand
SESSION: Effectiveness
System effectiveness, user models, and user utility: a conceptual framework for investigation
Ben Carterette
Pages: 903-912
doi>10.1145/2009916.2010037
Full text: PDFPDF

There is great interest in producing effectiveness measures that model user behavior in order to better model the utility of a system to its users. These measures are often formulated as a sum over the product of a discount function of ranks and a gain ...
expand
Evaluating the synergic effect of collaboration in information seeking
Chirag Shah, Roberto González-Ibáñez
Pages: 913-922
doi>10.1145/2009916.2010038
Full text: PDFPDF

It is typically expected that when people work together, they can often accomplish goals that are difficult or even impossible for individuals. We consider this notion of the group achieving more than the sum of all individuals' achievements to be the ...
expand
Repeatable and reliable search system evaluation using crowdsourcing
Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran Duc
Pages: 923-932
doi>10.1145/2009916.2010039
Full text: PDFPDF

The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long ...
expand
SESSION: Multilingual IR
Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization
Hua Wang, Heng Huang, Feiping Nie, Chris Ding
Pages: 933-942
doi>10.1145/2009916.2010041
Full text: PDFPDF

The lack of sufficient labeled Web pages in many languages, especially for those uncommonly used ones, presents a great challenge to traditional supervised classification methods to achieve satisfactory Web page classification performance. To address ...
expand
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Ferhan Ture, Tamer Elsayed, Jimmy Lin
Pages: 943-952
doi>10.1145/2009916.2010042
Full text: PDFPDF

This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this problem are of general interest for text mining in the multi-lingual context ...
expand
An event-centric model for multilingual document similarity
Jannik Strötgen, Michael Gertz, Conny Junghans
Pages: 953-962
doi>10.1145/2009916.2010043
Full text: PDFPDF

Document similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. ...
expand
SESSION: Efficiency
Posting list intersection on multicore architectures
Shirish Tatikonda, B. Barla Cambazoglu, Flavio P. Junqueira
Pages: 963-972
doi>10.1145/2009916.2010045
Full text: PDFPDF

In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection ...
expand
Timestamp-based result cache invalidation for web search engines
Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, Berkant Barla Cambazoglu, Özgür Ulusoy
Pages: 973-982
doi>10.1145/2009916.2010046
Full text: PDFPDF

The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify ...
expand
Energy-price-driven query processing in multi-center web search engines
Enver Kayaaslan, B. Barla Cambazoglu, Roi Blanco, Flavio P. Junqueira, Cevdet Aykanat
Pages: 983-992
doi>10.1145/2009916.2010047
Full text: PDFPDF

Concurrently processing thousands of web queries, each with a response time under a fraction of a second, necessitates maintaining and operating massive data centers. For large-scale web search engines, this translates into high energy consumption and ...
expand
Faster top-k document retrieval using block-max indexes
Shuai Ding, Torsten Suel
Pages: 993-1002
doi>10.1145/2009916.2010048
Full text: PDFPDF

Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization techniques called early termination achieves faster query processing by ...
expand
SESSION: Recommender systems
Utilizing marginal net utility for recommendation in e-commerce
Jian Wang, Yi Zhang
Pages: 1003-1012
doi>10.1145/2009916.2010050
Full text: PDFPDF

Traditional recommendation algorithms often select products with the highest predicted ratings to recommend. However, earlier research in economics and marketing indicates that a consumer usually makes purchase decision(s) based on the product's marginal ...
expand
Recommending ephemeral items at web scale
Ye Chen, John F. Canny
Pages: 1013-1022
doi>10.1145/2009916.2010051
Full text: PDFPDF

We describe an innovative and scalable recommendation system successfully deployed at eBay. To build recommenders for long-tail marketplaces requires projection of volatile items into a persistent space of latent products. We first present a generative ...
expand
A unified framework for recommendations based on quaternary semantic analysis
Chen Wei, Wynne Hsu, Mong Li Lee
Pages: 1023-1032
doi>10.1145/2009916.2010052
Full text: PDFPDF

Social network systems such as FaceBook and YouTube have played a significant role in capturing both explicit and implicit user preferences for different items in the form of ratings and tags. This forms a quaternary relationship among users, items, ...
expand
Associative tag recommendation exploiting multiple textual features
Fabiano Belém, Eder Martins, Tatiana Pontes, Jussara Almeida, Marcos Gonçalves
Pages: 1033-1042
doi>10.1145/2009916.2010053
Full text: PDFPDF

This work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimensions of the problem: (i) term co-occurrence with tags pre-assigned to the target object, (ii) terms extracted from multiple textual features, ...
expand
SESSION: Test collections
Evaluating diversified search results using per-intent graded relevance
Tetsuya Sakai, Ruihua Song
Pages: 1043-1052
doi>10.1145/2009916.2010055
Full text: PDFPDF

Search queries are often ambiguous and/or underspecified. To accomodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, ...
expand
Evaluating multi-query sessions
Evangelos Kanoulas, Ben Carterette, Paul D. Clough, Mark Sanderson
Pages: 1053-1062
doi>10.1145/2009916.2010056
Full text: PDFPDF

The standard system-based evaluation paradigm has focused on assessing the performance of retrieval systems in serving the best results for a single query. Real users, however, often begin an interaction with a search engine with a sufficiently under-specified ...
expand
Quantifying test collection quality based on the consistency of relevance judgements
Falk Scholer, Andrew Turpin, Mark Sanderson
Pages: 1063-1072
doi>10.1145/2009916.2010057
Full text: PDFPDF

Relevance assessments are a key component for test collection-based evaluation of information retrieval systems. This paper reports on a feature of such collections that is used as a form of ground truth data to allow analysis of human assessment error. ...
expand
Pseudo test collections for learning web search ranking functions
Nima Asadi, Donald Metzler, Tamer Elsayed, Jimmy Lin
Pages: 1073-1082
doi>10.1145/2009916.2010058
Full text: PDFPDF

Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank ...
expand
POSTER SESSION: Posters presentations
Parallel learning to rank for information retrieval
Shuaiqiang Wang, Byron J. Gao, Ke Wang, Hady W. Lauw
Pages: 1083-1084
doi>10.1145/2009916.2010060
Full text: PDFPDF

Learning to rank represents a category of effective ranking methods for information retrieval. While the primary concern of existing research has been accuracy, learning efficiency is becoming an important issue due to the unprecedented availability ...
expand
Learning features through feedback for blog distillation
Dehong Gao, Renxian Zhang, Wenjie Li, Yiu Keung Lau, Kam Fai Wong
Pages: 1085-1086
doi>10.1145/2009916.2010061
Full text: PDFPDF

The paper is focused on blogosphere research based on the TREC blog distillation task, and aims to explore unbiased and significant features automatically and efficiently. Feedback from faceted feeds is introduced to harvest relevant features and information ...
expand
Time-based relevance models
Mostafa Keikha, Shima Gerani, Fabio Crestani
Pages: 1087-1088
doi>10.1145/2009916.2010062
Full text: PDFPDF

This paper addresses blog feed retrieval where the goal is to retrieve the most relevant blog feeds for a given user query. Since the retrieval unit is a blog, as a collection of posts, performing relevance feedback techniques and selecting the most ...
expand
Improved query performance prediction using standard deviation
Ronan Cummins, Joemon Jose, Colm O'Riordan
Pages: 1089-1090
doi>10.1145/2009916.2010063
Full text: PDFPDF

Query performance prediction (QPP) is an important task in information retrieval (IR). In this paper, we (1) develop a new predictor based on the standard deviation of scores in a variable length ranked list, and (2) we show that this new predictor outperforms ...
expand
Learning to rank using query-level regression
Jiajin Wu, Zhihao Yang, Yuan Lin, Hongfei Lin, Zheng Ye, Kan Xu
Pages: 1091-1092
doi>10.1145/2009916.2010064
Full text: PDFPDF

In this paper, we use query-level regression as the loss function. The regression loss function has been used in pointwise methods, however pointwise methods ignore the query boundaries and treat the data equally across queries, and thus the effectiveness ...
expand
Diversifying product search results
Xiangru Chen, Haofen Wang, Xinruo Sun, Junfeng Pan, Yong Yu
Pages: 1093-1094
doi>10.1145/2009916.2010065
Full text: PDFPDF

In recent years, online shopping is becoming more and more popular. Users type keyword queries on product search systems to find relevant products, accessories, and even related products. However, existing product search systems always return very similar ...
expand
Ad hoc IR: not much room for improvement
Andrew Trotman, David Keeler
Pages: 1095-1096
doi>10.1145/2009916.2010066
Full text: PDFPDF

Ranking function performance reached a plateau in 1994. The reason for this is investigated. First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results -- it performs well. The performance is then ...
expand
Image annotation based on recommendation model
Zijia Lin, Guiguang Ding, Jianmin Wang
Pages: 1097-1098
doi>10.1145/2009916.2010067
Full text: PDFPDF

In this paper, a novel approach based on recommendation model is proposed for automatic image annotation. For any to-be-annotated image, we first select some related images with tags from training dataset according to their visual similarity. And then ...
expand
Utilizing minimal relevance feedback for ad hoc retrieval
Eyal Krikon, Oren Kurland
Pages: 1099-1100
doi>10.1145/2009916.2010068
Full text: PDFPDF

Using relevance feedback can significantly improve (ad hoc) retrieval effectiveness. Yet, if little feedback is available, effectively exploiting it is a challenge. To that end, we present a novel approach that utilizes document passages. Empirical evaluation ...
expand
Sense discrimination for physics retrieval
Christina Lioma, Alok Kothari, Hinrich Schuetze
Pages: 1101-1102
doi>10.1145/2009916.2010069
Full text: PDFPDF

Information Retrieval in technical domains like physics is characterised by long and precise queries, whose meaning is strongly influenced by term context and domain. We treat this as a disambiguation problem, and present initial findings of a retrieval ...
expand
When documents are very long, BM25 fails!
Yuanhua Lv, ChengXiang Zhai
Pages: 1103-1104
doi>10.1145/2009916.2010070
Full text: PDFPDF

We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which "shifts" the term frequency normalization formula to boost ...
expand
Location and timeliness of information sources during news events
Elad Yom-Tov, Fernando Diaz
Pages: 1105-1106
doi>10.1145/2009916.2010071
Full text: PDFPDF

People nowadays can obtain information on current news events through media outlets, social media, and by actively seeking information using search engines. In this paper we investigate the temporal relationship between news coverage by media outlets, ...
expand
What deliberately degrading search quality tells us about discount functions
Paul Thomas, Timothy Jones, David Hawking
Pages: 1107-1108
doi>10.1145/2009916.2010072
Full text: PDFPDF

Deliberate degradation of search results is a common tool in user experiments. We degrade high-quality search results by inserting non-relevant documents at different ranks. The effect of these manipulations, on a number of commonly-used metrics, is ...
expand
Collective topic modeling for heterogeneous networks
Hongbo Deng, Bo Zhao, Jiawei Han
Pages: 1109-1110
doi>10.1145/2009916.2010073
Full text: PDFPDF

In this paper, we propose a joint probabilistic topic model for simultaneously modeling the contents of multi-typed objects of a heterogeneous information network. The intuition behind our model is that different objects of the heterogeneous network ...
expand
Graph-cut based tag enrichment
Xueming Qian, Xian-Sheng Hua
Pages: 1111-1112
doi>10.1145/2009916.2010074
Full text: PDFPDF

In this paper, a graph cut based tag enrichment approach is proposed. We build a graph for each image with its initial tags. The graph is with two terminals. Nodes of the graph are full connected with each other. Min-cut/max-flow algorithm is utilized ...
expand
Personalized social query expansion using social bookmarking systems
Mohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub, Johann Daigremont
Pages: 1113-1114
doi>10.1145/2009916.2010075
Full text: PDFPDF

We propose a new approach for social and personalized query expansion using social structures in the Web 2.0. While focusing on social tagging systems, the proposed approach considers (i) the semantic similarity between tags composing a query, (ii) a ...
expand
What are the real differences of children's and adults' web search
Tatiana Gossen, Thomas Low, Andreas Nürnberger
Pages: 1115-1116
doi>10.1145/2009916.2010076
Full text: PDFPDF

We present first results of a logfile analysis on web search engines for children. The aim of this research is to analyse fundamental facts about how children's web search behaviour differs from that of adults. We show differences to previous results, ...
expand
Cognitive coordinating behaviors in multitasking web search
Jia Tina Du
Pages: 1117-1118
doi>10.1145/2009916.2010077
Full text: PDFPDF

This paper investigates how users cognitively coordinate multitasking Web search across different information search problems. The analysis suggests that (1) multitasking is a prevalent Web search behavior including both sequential multitasking (31%) ...
expand
Optimizing multimodal reranking for web image search
Hao Li, Meng Wang, Zhisheng Li, Zheng-Jun Zha, Jialie Shen
Pages: 1119-1120
doi>10.1145/2009916.2010078
Full text: PDFPDF

In this poster, we introduce a web image search reranking approach with exploring multiple modalities. Diff erent from the conventional methods that build graph with one feature set for reranking, our approach integrates multiple feature sets that describe ...
expand
Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce
Wen-Yu Lee, Liang-Chi Hsieh, Guan-Long Wu, Winston Hsu, Ya-Fan Su
Pages: 1121-1122
doi>10.1145/2009916.2010079
Full text: PDFPDF

Semi-supervised learning is to exploit the vast amount of unlabeled data in the world. This paper proposes a scalable graph-based technique leveraging the distributed computing power of the MapReduce programming model. For a higher quality of learning, ...
expand
Tackling class imbalance and data scarcity in literature-based gene function annotation
Mathieu Blondel, Kazuhiro Seki, Kuniaki Uehara
Pages: 1123-1124
doi>10.1145/2009916.2010080
Full text: PDFPDF

In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass ...
expand
Bootstrapping subjectivity detection
Valentin Jijkoun, Maarten de Rijke
Pages: 1125-1126
doi>10.1145/2009916.2010081
Full text: PDFPDF

We describe a method for automatically generating subjectivity clues for a specific topic and a set of (relevant) document, evaluating it on the task of classifying sentences w.r.t. subjectivity, with improvements over previous work.
expand
The effects of choice in routing relevance judgments
Edith Law, Paul N. Bennett, Eric Horvitz
Pages: 1127-1128
doi>10.1145/2009916.2010082
Full text: PDFPDF

The emergence of human computation systems, including Mechanical Turk and games with a purpose, has made it feasible to distribute relevance judgment tasks to workers over the Web. Most human computation systems assign tasks to individuals randomly, ...
expand
Statistical feature extraction for cross-language web content quality assessment
Guang-Gang Geng, Xiao-Dong Li, Li-Ming Wang, Wei Wang, Shuo Shen
Pages: 1129-1130
doi>10.1145/2009916.2010083
Full text: PDFPDF

Web content quality assessment is a typical static ranking problem. Heuristic content and TFIDF features based statistical systems have proven effective for Web content quality assessment. But they are all language dependent features, which are not suitable ...
expand
Exploiting endorsement information and social influence for item recommendation
Cheng-Te Li, Shou-De Lin, Man-Kwan Shan
Pages: 1131-1132
doi>10.1145/2009916.2010084
Full text: PDFPDF

Social networking services possess two features: (1) capturing the social relationships among people, represented by the social network, and (2) allowing users to express their preferences on different kinds of items (e.g. photo, celebrity, pages) through ...
expand
Modeling subset distributions for verbose queries
Xiaobing Xue, W. Bruce Croft
Pages: 1133-1134
doi>10.1145/2009916.2010085
Full text: PDFPDF

Improving verbose (or long) queries poses a new challenge for search systems. Previous techniques mainly focused on two aspects, weighting the important words or phrases and selecting the best subset query. The former does not consider how words and ...
expand
Domain expert topic familiarity and search behavior
Sarvnaz Karimi, Falk Scholer, Adam Clark, Sadegh Kharazmi
Pages: 1135-1136
doi>10.1145/2009916.2010086
Full text: PDFPDF

Users of information retrieval systems employ a variety of strategies when searching for information. One factor that can directly influence how searchers go about their information finding task is the level of familiarity with a search topic. We investigate ...
expand
Sample selection for dictionary-based corpus compression
Christopher Hoobin, Simon Puglisi, Justin Zobel
Pages: 1137-1138
doi>10.1145/2009916.2010087
Full text: PDFPDF

Compression of large text corpora has the potential to drastically reduce both storage requirements and per-document access costs. Adaptive methods used for general-purpose compression are ineffective for this application, and historically the most successful ...
expand
Evaluating medical information retrieval
Bevan Koopman, Peter Bruza, Laurianne Sitbon, Michael Lawley
Pages: 1139-1140
doi>10.1145/2009916.2010088
Full text: PDFPDF

This paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminologists using the ...
expand
Region-based landmark discovery by crowdsourcing geo-referenced photos
Yen-Ta Huang, An-Jung Cheng, Liang-Chi Hsieh, Winston Hsu, Kuo-Wei Chang
Pages: 1141-1142
doi>10.1145/2009916.2010089
Full text: PDFPDF

We propose a novel model for landmark discovery that locates region-based landmarks on map in contrast to the traditional point-based landmarks. The proposed method preserves more information and automatically identifies candidate regions on map by crowdsourcing ...
expand
Towards effective short text deep classification
Xinruo Sun, Haofen Wang, Yong Yu
Pages: 1143-1144
doi>10.1145/2009916.2010090
Full text: PDFPDF

Recently, more and more short texts (e.g., ads, tweets) appear on the Web. Classifying short texts into a large taxonomy like ODP or Wikipedia category system has become an important mining task to improve the performance of many applications such as ...
expand
Temporal latent semantic analysis for collaboratively generated content: preliminary results
Yu Wang, Eugene Agichtein
Pages: 1145-1146
doi>10.1145/2009916.2010091
Full text: PDFPDF

Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due ...
expand
Self-adjusting hybrid recommenders based on social network analysis
Alejandro Bellogin, Pablo Castells, Ivan Cantador
Pages: 1147-1148
doi>10.1145/2009916.2010092
Full text: PDFPDF

Ensemble recommender systems successfully enhance recom-mendation accuracy by exploiting different sources of user prefe-rences, such as ratings and social contacts. In linear ensembles, the optimal weight of each recommender strategy is commonly tuned ...
expand
BlogCast effect on information diffusion in a blogosphere
Sang-Wook Kim, Christos Faloutsos, Jiwoon Ha
Pages: 1149-1150
doi>10.1145/2009916.2010093
Full text: PDFPDF

A blog service company provides a function named BlogCast that exposes quality posts on the blog main page to vitalize a blogosphere. This paper analyzes a new type of information diffusion via BlogCast. We show that there exists a strong halo effect ...
expand
Product comparison using comparative relations
Si Li, Zheng-Jun Zha, Zhaoyan Ming, Meng Wang, Tat-Seng Chua, Jun Guo, Weiran Xu
Pages: 1151-1152
doi>10.1145/2009916.2010094
Full text: PDFPDF

This paper proposes a novel Product Comparison approach. The comparative relations between products are first mined from both user reviews on multiple review websites and community-based question answering pairs containing product comparison information. ...
expand
Collaborative cyberporn filtering with collective intelligence
Lung-Hao Lee, Hsin-Hsi Chen
Pages: 1153-1154
doi>10.1145/2009916.2010095
Full text: PDFPDF

This paper presents a user intent method to generate blacklists for collaborative cyberporn filtering. A novel porn detection framework that finds new pornographic web pages by mining user search behaviors is proposed. It employs users' clicks in search ...
expand
Do IR models satisfy the TDC retrieval constraint
Stéphane Clinchant, Eric Gaussier
Pages: 1155-1156
doi>10.1145/2009916.2010096
Full text: PDFPDF
On diversifying and personalizing web search
David Vallet, Pablo Castells
Pages: 1157-1158
doi>10.1145/2009916.2010097
Full text: PDFPDF

Diversification and personalization methods are common ap-proaches to deal with the one-size-fits-all paradigm of Web search engines. We performed a user study with 190 subjects where we analyzed the effects of diversification and personalization methods ...
expand
Semantic tag recommendation using concept model
Chenliang Li, Anwitaman Datta, Aixin Sun
Pages: 1159-1160
doi>10.1145/2009916.2010098
Full text: PDFPDF

The common tags given by multiple users to a particular document are often semantically relevant to the document and each tag represents a specific topic. In this paper, we attempt to emulate human tagging behavior to recommend tags by considering the ...
expand
Recommending interesting activity-related local entities
Jie Tang, Ryen W. White, Peter Bailey
Pages: 1161-1162
doi>10.1145/2009916.2010099
Full text: PDFPDF

When searching for entities with a strong local character (e.g., a museum), people may also be interested in discovering proximal activity-related entities (e.g., a café). Geographical proximity is a necessary, but not sufficient, qualifier for ...
expand
Cross-corpus relevance projection
Nima Asadi, Donald Metzler, Jimmy Lin
Pages: 1163-1164
doi>10.1145/2009916.2010100
Full text: PDFPDF
Location disambiguation for geo-tagged images
Zhu Zhu, Lidan Shou, Kuang Mao, Gang Chen
Pages: 1165-1166
doi>10.1145/2009916.2010101
Full text: PDFPDF

In this poster, we address the problem of location disambiguation for geotagged Web photo resources. We propose an approach for analyzing and partitioning large geotagged photo collections using geographic and semantic information. By organizing the ...
expand
Towards an indexing method to speed-up music retrieval
Benjamin Martin, Pierre Hanna, Matthias Robine, Pascal Ferraro
Pages: 1167-1168
doi>10.1145/2009916.2010102
Full text: PDFPDF

Computations in most music retrieval systems strongly depend on the size of data compared. We propose to enhance performances of a music retrieval system, namely a harmonic similarity evaluation method, by first indexing relevant parts of music pieces. ...
expand
An investigation of decompounding for cross-language patent search
Johannes Leveling, Walid Magdy, Gareth J.F. Jones
Pages: 1169-1170
doi>10.1145/2009916.2010103
Full text: PDFPDF

Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) ...
expand
Detecting seasonal queries by time-series analysis
Milad Shokouhi
Pages: 1171-1172
doi>10.1145/2009916.2010104
Full text: PDFPDF

Seasonal events such as Halloween and Christmas repeat every year and initiate several temporal information needs. The impact of such events on users is often reflected in search logs in form of seasonal spikes in the frequency of related queries (e.g. ...
expand
Learning to rank under tight budget constraints
Christian Pölitz, Ralf Schenkel
Pages: 1173-1174
doi>10.1145/2009916.2010105
Full text: PDFPDF

This paper investigates the influence of pruning feature lists to keep a given budget for the evaluation of ranking methods. We learn from a given training set how important the individual prefixes are for the ranking quality. Based on there importance ...
expand
A novel hybrid index structure for efficient text retrieval
Andreas Broschart, Ralf Schenkel
Pages: 1175-1176
doi>10.1145/2009916.2010106
Full text: PDFPDF

Query processing with precomputed term pair lists can improve efficiency for some queries, but suffers from the quadratic number of index lists that need to be read. We present a novel hybrid index structure that aims at decreasing the number of index ...
expand
A weighted curve fitting method for result merging in federated search
Chuan He, Dzung Hong, Luo Si
Pages: 1177-1178
doi>10.1145/2009916.2010107
Full text: PDFPDF

Result merging is an important step in federated search to merge the documents returned from multiple source-specific ranked lists for a user query. Previous result merging methods such as Semi-Supervised Learning (SSL) and Sample- Agglomerate Fitting ...
expand
Effect of different docid orderings on dynamic pruning retrieval strategies
Nicola Tonellotto, Craig Macdonald, Iadh Ounis
Pages: 1179-1180
doi>10.1145/2009916.2010108
Full text: PDFPDF

Document-at-a-time (DAAT) dynamic pruning strategies for information retrieval systems such as MaxScore and Wand can increase querying efficiency without decreasing effectiveness. Both work on posting lists sorted by ascending document identifier (docid). ...
expand
Time-based query performance predictors
Nattiya Kanhabua, Kjetil Nørvåg
Pages: 1181-1182
doi>10.1145/2009916.2010109
Full text: PDFPDF

Query performance prediction is aimed at predicting the retrieval effectiveness that a query will achieve with respect to a particular ranking model. In this paper, we study query performance prediction for a ranking model that explicitly incorporates ...
expand
Search task difficulty: the expected vs. the reflected
Jingjing Liu, Nicholas J. Belkin
Pages: 1183-1184
doi>10.1145/2009916.2010110
Full text: PDFPDF

We report findings on how the user's perception of task difficulty changes before and after searching for information to solve tasks. We found that while in one type of task, the dependent task, this did not change, in another, the parallel task, it ...
expand
On the suitability of diversity metrics for learning-to-rank for diversity
Rodrygo L.T. Santos, Craig Macdonald, Iadh Ounis
Pages: 1185-1186
doi>10.1145/2009916.2010111
Full text: PDFPDF

An optimally diverse ranking should achieve the maximum coverage of the aspects underlying an ambiguous or underspecified query, with minimum redundancy with respect to the covered aspects. Although evaluation metrics that reward coverage and penalise ...
expand
How diverse are web search results?
Rodrygo L.T. Santos, Craig Macdonald, Iadh Ounis
Pages: 1187-1188
doi>10.1145/2009916.2010112
Full text: PDFPDF

Search result diversification has recently gained attention as a means to tackle ambiguous queries. While query ambiguity is of particular concern for the short queries commonly observed in a Web search scenario, it is unclear how much diversity is actually ...
expand
Analysis of an expert search query log
Yi Fang, Naveen Somasundaram, Luo Si, Jeongwoo Ko, Aditya P. Mathur
Pages: 1189-1190
doi>10.1145/2009916.2010113
Full text: PDFPDF

Expert search has made rapid progress in modeling, algorithms and evaluations in the recent years. However, there is very few work on analyzing how users interact with expert search systems. In this paper, we conduct analysis of an expert search query ...
expand
A model for expert finding in social networks
Elena Smirnova
Pages: 1191-1192
doi>10.1145/2009916.2010114
Full text: PDFPDF

Expert finding is a task of finding knowledgeable people on a given topic. State-of-the-art expertise retrieval algorithms identify matching experts based on analysis of textual content of documents experts are associated with. While powerful, these ...
expand
Transductive learning over automatically detected themes for multi-document summarization
Massih-Reza Amini, Nicolas Usunier
Pages: 1193-1194
doi>10.1145/2009916.2010115
Full text: PDFPDF

We propose a new method for query-biased multi-document summarization, based on sentence extraction. The summary of multiple documents is created in two steps. Sentences are first clustered; where each cluster corresponds to one of the main themes present ...
expand
Rating-based collaborative filtering combined with additional regularization
Shu Wu, Shengrui Wang
Pages: 1195-1196
doi>10.1145/2009916.2010116
Full text: PDFPDF

The collaborative filtering (CF) approach to recommender system has received much attention recently. However, previous work mainly focuses on improving the formula of rating prediction, e.g. by adding user and item biases, implicit feedback and time-aware ...
expand
Words-of-interest selection based on temporal motion coherence for video retrieval
Lei Wang, Dawei Song, Eyad Elyan
Pages: 1197-1198
doi>10.1145/2009916.2010117
Full text: PDFPDF

The "Bag of Visual Words" (BoW) framework has been widely used in query-by-example video retrieval to model the visual content by a set of quantized local feature descriptors. In this paper, we propose a novel technique to enhance BoW by the selection ...
expand
Aggregating multiple opinion evidence in proximity-based opinion retrieval
Shima Gerani, Mostafa Keikha, Fabio Crestani
Pages: 1199-1200
doi>10.1145/2009916.2010118
Full text: PDFPDF

Blog post opinion retrieval is the problem of ranking blog posts according to the likelihood that the post is relevant to the query and that the author was expressing an opinion about the topic (of the query). A recent study has proposed a method for ...
expand
Enhancing mobile search using web search log data
Yoshiyuki Inagaki, Jiang Bian, Yi Chang, Motoko Maki
Pages: 1201-1202
doi>10.1145/2009916.2010119
Full text: PDFPDF

Mobile search is still in infancy compared with general purpose web search. With limited training data and weak relevance features, the ranking performance in mobile search is far from satisfactory. To address this problem, we propose to leverage the ...
expand
Award prediction with temporal citation network analysis
Zaihan Yang, Dawei Yin, Brian D. Davison
Pages: 1203-1204
doi>10.1145/2009916.2010120
Full text: PDFPDF

Each year many ACM SIG communities will recognize an outstanding researcher through an award in honor of his or her profound impact and numerous research contributions. This work is the first to investigate an automated mechanism to help in selecting ...
expand
Rating prediction using feature words extracted from customer reviews
Masanao Ochi, Makoto Okabe, Rikio Onai
Pages: 1205-1206
doi>10.1145/2009916.2010121
Full text: PDFPDF

We developed a simple method of improving the accuracy of rating prediction using feature words extracted from customer reviews. Many rating predictors work well for a small and dense dataset of customer reviews. However, a practical dataset tends to ...
expand
Ranking tags in resource collections
Dimitrios Skoutas, Mohammad Alrifai
Pages: 1207-1208
doi>10.1145/2009916.2010122
Full text: PDFPDF

We examine different tag ranking strategies for constructing tag clouds to represent collections of tagged objects. The proposed methods are based on random walk on graphs, diversification, and rank aggregation, and they are empirically evaluated on ...
expand
Identifying similar people in professional social networks with discriminative probabilistic models
Suleyman Cetintas, Monica Rogati, Luo Si, Yi Fang
Pages: 1209-1210
doi>10.1145/2009916.2010123
Full text: PDFPDF

Identifying similar professionals is an important task for many core services in professional social networks. Information about users can be obtained from heterogeneous information sources, and different sources provide different insights on user similarity. ...
expand
Intent-oriented diversity in recommender systems
Saul Vargas, Pablo Castells, David Vallet
Pages: 1211-1212
doi>10.1145/2009916.2010124
Full text: PDFPDF

Diversity as a relevant dimension of retrieval quality is receiving increasing attention in the Information Retrieval and Recommender Systems (RS) fields. The problem has nonetheless been approached under different views and formulations in IR and RS ...
expand
Disambiguating biomedical acronyms using EMIM
Nut Limsopatham, Rodrygo L.T. Santos, Craig Macdonald, Iadh Ounis
Pages: 1213-1214
doi>10.1145/2009916.2010125
Full text: PDFPDF

Expanding a query with acronyms or their corresponding 'long-forms' has not been shown to provide consistent improvements in the biomedical IR literature. The major open issue with expanding acronyms in a query is their inherent ambiguity, as an acronym ...
expand
Best document selection based on approximate utility optimization
Hungyu Henry Lin, Yi Zhang, James Davis
Pages: 1215-1216
doi>10.1145/2009916.2010126
Full text: PDFPDF

This poster describes an alternative approach to handling the best document selection problem. Best document selection is a common problem with many real world applications, but is not a well studied problem by itself; a simple solution would be to treat ...
expand
Forecasting counts of user visits for online display advertising with probabilistic latent class models
Suleyman Cetintas, Datong Chen, Luo Si, Bin Shen, Zhanibek Datbayev
Pages: 1217-1218
doi>10.1145/2009916.2010127
Full text: PDFPDF

Display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number ...
expand
Knowledge effects on document selection in search results pages
Michael J. Cole, Xiangmin Zhang, Chang Liu, Nicholas J. Belkin, Jacek Gwizdka
Pages: 1219-1220
doi>10.1145/2009916.2010128
Full text: PDFPDF

Click through events in search results pages (SERPs) are not reliable implicit indicators of document relevance. A user's task and domain knowledge are key factors in recognition and link selection and the most useful SERP document links may be those ...
expand
Learning to rank from a noisy crowd
Abhimanu Kumar, Matthew Lease
Pages: 1221-1222
doi>10.1145/2009916.2010129
Full text: PDFPDF

We study how to best use crowdsourced relevance judgments learning to rank [1, 7]. We integrate two lines of prior work: unreliable crowd-based binary annotation for binary classification [5, 3], and aggregating graded relevance judgments from reliable ...
expand
How to count thumb-ups and thumb-downs?: an information retrieval approach to user-rating based ranking of items
Dell Zhang, Robert Mao, Haitao Li, Joanne Mao
Pages: 1223-1224
doi>10.1145/2009916.2010130
Full text: PDFPDF

It is a common practice among Web 2.0 services to allow users to rate items on their sites. In this paper, we first point out the flaws of the popular methods for user-rating based ranking of items, and then argue that two well-known Information Retrieval ...
expand
Predicting users' domain knowledge from search behaviors
Xiangmin Zhang, Michael Cole, Nicholas Belkin
Pages: 1225-1226
doi>10.1145/2009916.2010131
Full text: PDFPDF

This study uses regression modeling to predict a user's domain knowledge level (DK) from implicit evidence provided by certain search behaviors. A user study (n=35) with recall-oriented search tasks in the genomic domain was conducted. A number of regression ...
expand
The interactive PRP for diversifying document rankings
Guido Zuccon, Leif Azzopardi, C.J. "Keith" van Rijsbergen
Pages: 1227-1228
doi>10.1145/2009916.2010132
Full text: PDFPDF

The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP's limitations. In this poster we focus on the Interactive PRP (iPRP), which rejects the assumption ...
expand
Detecting success in mobile search from interaction
Qi Guo, Shuai Yuan, Eugene Agichtein
Pages: 1229-1230
doi>10.1145/2009916.2010133
Full text: PDFPDF

Predicting searcher success and satisfaction is a key problem in Web search, which is essential for automatic evaluating and improving search engine performance. This problem has been studied actively in the desktop search setting, but not specifically ...
expand
Measuring assessor accuracy: a comparison of nist assessors and user study participants
Mark D. Smucker, Chandra Prakash Jethani
Pages: 1231-1232
doi>10.1145/2009916.2010134
Full text: PDFPDF

In many situations, humans judging document relevance are forced to trade-off accuracy for speed. The development of better interactive retrieval systems and relevance assessing platforms requires the measurement of assessor accuracy, but to date the ...
expand
A bipartite graph based social network splicing method for person name disambiguation
Jintao Tang, Qin Lu, Ting Wang, Ji Wang, Wenjie Li
Pages: 1233-1234
doi>10.1145/2009916.2010135
Full text: PDFPDF

The key issue of person name disambiguation is to discover different namesakes in massive web documents rather than simply cluster documents by using textual features. In this paper, we describe a novel person name disambiguation method based on social ...
expand
Link formation analysis in microblogs
Dawei Yin, Liangjie Hong, Xiong Xiong, Brian D. Davison
Pages: 1235-1236
doi>10.1145/2009916.2010136
Full text: PDFPDF

Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are ...
expand
Evolution of web search results within years
Ismail Sengor Altingovde, Rifat Ozcan, Özgür Ulusoy
Pages: 1237-1238
doi>10.1145/2009916.2010137
Full text: PDFPDF

We provide a first large-scale analysis of the evolution of query results obtained from a real search engine at two distant points in time, namely, in 2007 and 2010, for a set of 630,000 real queries.
expand
Decayed DivRank: capturing relevance, diversity and prestige in information networks
Pan Du, Jiafeng Guo, Xue-Qi Cheng
Pages: 1239-1240
doi>10.1145/2009916.2010138
Full text: PDFPDF

Many network-based ranking approaches have been proposed to rank objects according to different criteria, including relevance, prestige and diversity. However, existing approaches either only aim at one or two of the criteria, or handle them with additional ...
expand
Multi-objective optimization in learning to rank
Na Dai, Milad Shokouhi, Brian D. Davison
Pages: 1241-1242
doi>10.1145/2009916.2010139
Full text: PDFPDF

Supervised learning to rank algorithms typically optimize for high relevance and ignore other facets of search quality, such as freshness and diversity. Prior work on multi-objective ranking trained rankers focused on using hybrid labels that combine ...
expand
A large-scale study of the effect of training set characteristics over learning-to-rank algorithms
Evangelos Kanoulas, Stefan Savev, Pavel Metrikov, Virgil Pavlu, Javed Aslam
Pages: 1243-1244
doi>10.1145/2009916.2010140
Full text: PDFPDF

In this work we describe the results of a large-scale study on the effect of the distribution of labels across the different grades of relevance in the training set on the performance of trained ranking functions. In a controlled experiment we generate ...
expand
Exploring term temporality for pseudo-relevance feedback
Stewart Whiting, Yashar Moshfeghi, Joemon M. Jose
Pages: 1245-1246
doi>10.1145/2009916.2010141
Full text: PDFPDF

As digital collections expand, the importance of the temporal aspect of information has become increasingly apparent. The aim of this paper is to investigate the effect of using long-term temporal profiles of terms in information retrieval by enhancing ...
expand
MSSF: a multi-document summarization framework based on submodularity
Jingxuan Li, Lei Li, Tao Li
Pages: 1247-1248
doi>10.1145/2009916.2010142
Full text: PDFPDF

Multi-document summarization aims to distill the most representative information from a set of documents to generate a summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection ...
expand
SEJoin: an optimized algorithm towards efficient approximate string searches
Junfeng Zhou, Ziyang Chen, Jingrong Zhang
Pages: 1249-1250
doi>10.1145/2009916.2010143
Full text: PDFPDF

We investigated the problem of finding from a collection of strings those similar to a given query string based on edit distance, for which the critical operation is merging inverted lists of grams generated from the collection of strings. We present ...
expand
Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval
Konstantinos Zagoris, Savvas A. Chatzichristofis, Avi Arampatzis
Pages: 1251-1252
doi>10.1145/2009916.2010144
Full text: PDFPDF

The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being ...
expand
Query term ranking based on search results overlap
Wei Song, Yu Zhang, Yubin Xie, Ting Liu, Sheng Li
Pages: 1253-1254
doi>10.1145/2009916.2010145
Full text: PDFPDF

In this paper, we propose a method to rank and assign weights to query terms according to their impact on the topic of the query. We use Search Result Overlap Ratio (SROR) to quantify the overlap of the search results of the full query and a shorten ...
expand
Tossing coins to trim long queries
Sudip Datta, Vasudeva Varma
Pages: 1255-1256
doi>10.1145/2009916.2010146
Full text: PDFPDF

Verbose web queries are often descriptive in nature where a term based search engine is unable to distinguish between the essential and noisy words, which can result in a drift from the user intent. We present a randomized query reduction technique that ...
expand
A comparison of time-aware ranking methods
Nattiya Kanhabua, Kjetil Nørvåg
Pages: 1257-1258
doi>10.1145/2009916.2010147
Full text: PDFPDF

When searching a temporal document collection, e.g., news archives or blogs, the time dimension must be explicitly incorporated into a retrieval model in order to improve relevance ranking. Previous work has followed one of two main approaches: 1) a ...
expand
Learning for graphs with annotated edges
Fan Li
Pages: 1259-1260
doi>10.1145/2009916.2010148
Full text: PDFPDF

Automatic classification with graphs containing annotated edges is an interesting problem and has many potential applications. We present a risk minimization formulation that exploits the annotated edges for classification tasks. One major advantage ...
expand
Formulating effective questions for community-based question answering
Saori Suzuki, Shin'ichi Nakayama, Hideo Joho
Pages: 1261-1262
doi>10.1145/2009916.2010149
Full text: PDFPDF

Community-based Question Answering (CQA) services have become a major venue for people's information seeking on the Web. However, many studies on CQA have focused on the prediction of the best answers for a given question. This paper looks into ...
expand
DEMONSTRATION SESSION: Demonstrations
ClusteringWiki: personalized and collaborative clustering of search results
Dragos C. Anastasiu, Byron J. Gao, David Buttler
Pages: 1263-1264
doi>10.1145/2009916.2010151
Full text: PDFPDF

How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and ...
expand
OrientSTS: spatio-temporal sequence searching in flickr
Chunjie Zhou, Dongqi Liu, Xiaofeng Meng
Pages: 1265-1266
doi>10.1145/2009916.2010152
Full text: PDFPDF

Nowadays, due to the increasing user requirements of efficient and personalized services, a perfect travel plan is urgently needed. However, at present it is hard for people to make a personalized traveling plan. Most of them follow other people's general ...
expand
A toolkit for knowledge base population
Zheng Chen, Suzanne Tamang, Adam Lee, Heng Ji
Pages: 1267-1268
doi>10.1145/2009916.2010153
Full text: PDFPDF

The main goal of knowledge base population (KBP) is to distill entity information (e.g., facts of a person) from multiple unstructured and semi-structured data sources, and incorporate the information into a knowledge base (KB). In this work, we intend ...
expand
iMecho: a context-aware desktop search system
Jidong Chen, Hang Guo, Wentao Wu, Wei Wang
Pages: 1269-1270
doi>10.1145/2009916.2010154
Full text: PDFPDF

In this demo, we present iMecho, a context-aware desktop search system to help users get more relevant results. Different from other desktop search engines, iMecho ranks results not only by the content of the query, but also the context of the query. ...
expand
Visualizing and querying semantic social networks
Aixin Sun, Anwitaman Datta, Ee-Peng Lim, Kuiyu Chang
Pages: 1271-1272
doi>10.1145/2009916.2010155
Full text: PDFPDF

We demonstrate SSNetViz that is developed for integrating, visualizing and querying heterogeneous semantic social networks obtained from multiple information sources. A semantic social network refers to a social network graph with multi-typed nodes and ...
expand
What-you-retrieve-is-what-you-see: a preliminary cyber-physical search engine
Lidan Shou, Ke Chen, Gang Chen, Chao Zhang, Yi Ma, Xian Zhang
Pages: 1273-1274
doi>10.1145/2009916.2010156
Full text: PDFPDF

The cyber-physical systems (CPS) are envisioned as a class of real-time systems integrating the computing, communication and storage facilities with monitoring and control of the physical world. One interesting CPS application in the mobile Internet ...
expand
QuickView: advanced search of tweets
Xiaohua Liu, Long Jiang, Furu Wei, Ming Zhou, QuickView Team Microsoft
Pages: 1275-1276
doi>10.1145/2009916.2010157
Full text: PDFPDF

Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. ...
expand
Personalized video: leanback online video consumption
Krishnan Ramanathan, Yogesh Sankarasubramaniam, Vidhya Govindaraju
Pages: 1277-1278
doi>10.1145/2009916.2010158
Full text: PDFPDF

Current user interfaces for online video consumption are mostly browser based, lean forward, require constant interaction and provide a fragmented view of the total content available. For easier consumption, the user interface and interactions need to ...
expand
GreenMeter: a tool for assessing the quality and recommending tags for web 2.0 applications
Saulo M.R. Ricci, Dilson A. Guimarães, Fabiano M. Belém, Jussara M. Almeida, Marcos A. Gonçalves, Raquel Prates
Pages: 1279-1280
doi>10.1145/2009916.2010159
Full text: PDFPDF

We present GreenMeter, a tool for assessing the quality and recommending tags for Web 2.0 content. Its goal is to improve tag quality and the effectiveness of various information services (e.g., search, content recommendation) that rely on tags as data ...
expand
JuSe: a picture dictionary query system for children
Tamara Polajnar, Richard Glassey, Leif Azzopardi
Pages: 1281-1282
doi>10.1145/2009916.2010160
Full text: PDFPDF

As adults we take for granted our capacity to express our information needs verbally and textually. However, young children also have preferences and information needs, but are just learning to be able to express themselves effectively. Consequently ...
expand
CrowdTracker: enabling community-based real-time web monitoring
James Caverlee, Zhiyuan Cheng, Brian Eoff, Chiao-Fang Hsu, Krishna Kamath, Jeffrey McGee
Pages: 1283-1284
doi>10.1145/2009916.2010161
Full text: PDFPDF

CrowdTracker is a community-based web monitoring system optimized for real-time web streams like Twitter, Facebook, and Google Buzz. In this demo summary, we provide an overview of the system and architecture, and outline the demonstration plan.
expand
The Meta-Dex Suite: generating and analyzing indexes and meta-indexes
Michael Huggett, Edie Rasmussen
Pages: 1285-1286
doi>10.1145/2009916.2010162
Full text: PDFPDF

Our Meta-dex software suite extracts content and index text from a corpus of PDF files, and generates a meta-index that references entries across an entire domain. We provide tools to analyze the individual and integrated indexes, and visualize entries ...
expand
Tulsa: web search for writing assistance
Duo Ding, Xingping Jiang, Matthew R. Scott, Ming Zhou, Yong Yu
Pages: 1287-1288
doi>10.1145/2009916.2010163
Full text: PDFPDF
The TREC files: the (ground) truth is out there
Savvas A. Chatzichristofis, Konstantinos Zagoris, Avi Arampatzis
Pages: 1289-1290
doi>10.1145/2009916.2010164
Full text: PDFPDF

Traditional tools for information retrieval (IR) evaluation, such as TREC's trec_eval, have outdated command-line interfaces with many unused features, or 'switches', accumulated over the years. They are usually seen as cumbersome applications by new ...
expand
A tool for comparative IR evaluation on component level
Thomas Wilhelm, Jens Kürsten, Maximilian Eibl
Pages: 1291-1292
doi>10.1145/2009916.2010165
Full text: PDFPDF
TUTORIAL SESSION: Tutorials
Machine learning for information retrieval
Luo Si, Rong Jin
Pages: 1293-1294
doi>10.1145/2009916.2010167
Full text: PDFPDF

In recent years, we have witnessed successful application of machine learning techniques to a wide range of information retrieval problems, including Web search engines, recommendation systems, online advertising, etc. It is thus critical for researchers ...
expand
Enhancing web search by mining search and browse logs
Daxin Jiang, Jian Pei, Hang Li
Pages: 1295-1296
doi>10.1145/2009916.2010168
Full text: PDFPDF

Huge amounts of search log data have been accumulated in various search engines. Currently, a commercial search engine receives billions of queries and collects tera-bytes of log data on any single day. Other than search log data, browse logs can be ...
expand
A new look at old tricks: the fertile roots of current research
Paul B. Kantor
Pages: 1297-1298
doi>10.1145/2009916.2010169
Full text: PDFPDF

As we face an explosion of potential new applications for the fundamental concepts and technologies of information retrieval, ranging from ad ranking to social media, from collaborative recommending to question answering systems, many researchers are ...
expand
Crowdsourcing for information retrieval: principles, methods, and applications
Omar Alonso, Matthew Lease
Pages: 1299-1300
doi>10.1145/2009916.2010170
Full text: PDFPDF

Crowdsourcing has emerged in recent years as a promising new avenue for leveraging today's digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning ...
expand
Practical online retrieval evaluation
Filip Radlinski, Yisong Yue
Pages: 1301-1302
doi>10.1145/2009916.2010171
Full text: PDFPDF

Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers ...
expand
Web retrieval: the role of users
Ricardo Baeza-Yates, Yoelle Maarek
Pages: 1303-1304
doi>10.1145/2009916.2010172
Full text: PDFPDF

Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis ...
expand
Information organization and retrieval with collaboratively generated content
Eugene Agichtein, Evgeniy Gabrilovich
Pages: 1307-1308
doi>10.1145/2009916.2010173
Full text: PDFPDF

Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., ...
expand
SESSION: Doctoral consortium
Persistence in the ephemeral: utilizing repeat behaviors for multi-session personalized search
Sarah K. Tyler
Pages: 1311-1312
doi>10.1145/2009916.2010175
Full text: PDFPDF

As the abundance of information on the Internet grows, an increasing burden is placed on the user to specify his or her query precisely in order to avoid extraneous results that may be relevant, but not useful. At the same time, users have a tendency ...
expand
Search engines that learn online
Katja Hofmann
Pages: 1313-1314
doi>10.1145/2009916.2010176
Full text: PDFPDF

The goal of my research is to develop self-learning search engines, that can learn online, i.e., directly from interactions with actual users. Such systems can continuously adapt to user preferences throughout their lifetime, leading to better search ...
expand
Query expansion based on a semantic graph model
Xue Jiang
Pages: 1315-1316
doi>10.1145/2009916.2010177
Full text: PDFPDF

Query expansion is a classical topic in the field of information retrieval, which is proposed to bridge the gap between searchers' information intents and their queries. Previous researches usually expand queries based on document collections, or some ...
expand
Descriptive modelling of text classification and its integration with other IR tasks
Miguel Martinez-Alvarez
Pages: 1317-1318
doi>10.1145/2009916.2010178
Full text: PDFPDF

Nowadays, Information Retrieval (IR) systems have to deal with multiple sources of data available in different formats. Datasets can consist of complex and heterogeneous objects with relationships between them. In addition, information needs can vary ...
expand
Efficient and effective solutions for search engines
Xiang-Fei Jia
Pages: 1319-1320
doi>10.1145/2009916.2010179
Full text: PDFPDF
Modeling document scores for distributed information retrieval
Ilya Markov
Pages: 1321-1322
doi>10.1145/2009916.2010180
Full text: PDFPDF

Distributed Information Retrieval (DIR), also known as Federated Search, integrates multiple searchable collections and provides direct access to them through a unified interface [3]. This is done by a centralized broker, that receives user queries, ...
expand
Improving query and result list adaptation in personalized multilingual information retrieval
M. Rami Ghorab
Pages: 1323-1324
doi>10.1145/2009916.2010181
Full text: PDFPDF

A general characteristic of Information Retrieval (IR) and Multilingual IR (MIR) [5] systems is that if the same query was submitted by different users, the system would yield the same results, regardless of the user. On the other hand, Adaptive Hypermedia ...
expand
Using k-Top retrieved web snippets to date temporalimplicit queries based on web content analysis
Ricardo Nuno Taborda Campos
Pages: 1325-1326
doi>10.1145/2009916.2010182
Full text: PDFPDF

The World Wide Web (WWW) is a huge information network from which retrieving and organizing quality relevant content remains an open question for mostly all ambiguous queries. As an example, many queries have temporal implicit intents associated with ...
expand
Domain-specific information retrieval using rcommenders
Wei Li
Pages: 1327-1328
doi>10.1145/2009916.2010183
Full text: PDFPDF

The continuing increase in the volume of information available in our daily lives is creating ever greater challenges for people to find personally useful information. One approach used to addressing this problem is Personalized Information Retrieval ...
expand
Understanding and using contextual information in recommender systems
Licai Wang
Pages: 1329-1330
doi>10.1145/2009916.2010184
Full text: PDFPDF
Multidimensional search result diversification: diverse search results for diverse users
Sumit Bhatia
Pages: 1331-1332
doi>10.1145/2009916.2010185
Full text: PDFPDF

Hundreds of millions of people today rely on Web based Search Engines to satisfy their information needs. In order to meet the expectations of this vast and diverse user population, the search engine should present a list of results such that the probability ...
expand
SESSION: Industrial track
Sensor-aided mobile information management and retrieval
Edward Y. Chang
Pages: 1333-1334
doi>10.1145/2009916.2010187
Full text: PDFPDF

The number of "smart" mobile devices such as wireless phones and tablet computers has been rapidly growing. These mobile devices are equipped with a variety of sensors such as camera, gyroscope, accelerometer, compass, NFC, WiFi, GPS, etc. These sensors ...
expand
Predicting eBay listing conversion
Ted Tao Yuan, Zhaohui Chen, Mike Mathieson
Pages: 1335-1336
doi>10.1145/2009916.2010188
Full text: PDFPDF

At eBay Market Place, listing conversion rate can be measured by number of items sold divided by number of items in a sample set. For a given item, conversion rate can also be treated as the probability of sale. By investigating eBay listings' transactional ...
expand
A large scale machine learning system for recommending heterogeneous content in social networks
Yanxin Shi, David Ye, Andrey Goder, Srinivas Narayanan
Pages: 1337-1338
doi>10.1145/2009916.2010189
Full text: PDFPDF

The goal of the Facebook recommendation engine is to compare and rank heterogeneous types of content in order to find the most relevant recommendations based on user preference and page context. The challenges for such a recommendation engine include ...
expand
Review of MSR-Bing web scale speller challenge
Kuansan Wang, Jan Pedersen
Pages: 1339-1340
doi>10.1145/2009916.2010190
Full text: PDFPDF

In this paper, we provide an overview of the MSR-Bing Web Scale Speller Challenge of 2011. We describe the motivation and outline the algorithmic and engineering challenges posed by this activity. The design and the evaluation methods are also reviewed, ...
expand
Elsevier SIGIR 2011 application challenge abstract
Jukka Valimaki, Remko Caprio
Pages: 1341-1342
doi>10.1145/2009916.2010191
Full text: PDFPDF

Elsevier SIGIR 2011 Application Challenge is an international competition that encourages software developers to create applications that run on Elsevier's SciVerse platform. The Challenge is open to all SIGIR 2011 Conference participants.
expand

Powered by The ACM Guide to Computing Literature


Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Contact The DL Team Contact Us | Switch to single page view (no tabs)
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Table of Contents
Is the cranfield paradigm outdated?
Donna Harman
Pages: 1-1
doi>10.1145/1835449.1835450
Full text: PDFPDF
SESSION: Clustering I
Gabriella Pasi
Prototype hierarchy based clustering for the categorization and navigation of web collections
Zhao-Yan Ming, Kai Wang, Tat-Seng Chua
Pages: 2-9
doi>10.1145/1835449.1835453
Full text: PDFPDF

This paper presents a novel prototype hierarchy based clustering (PHC) framework for the organization of web collections. It solves simultaneously the problem of categorizing web collections and interpreting the clustering results for navigation. By ...
expand
Person name disambiguation by bootstrapping
Minoru Yoshida, Masaki Ikeda, Shingo Ono, Issei Sato, Hiroshi Nakagawa
Pages: 10-17
doi>10.1145/1835449.1835454
Full text: PDFPDF

In this paper, we report our system that disambiguates person names in Web search results. The system uses named entities, compound key words, and URLs as features for document similarity calculation, which typically show high precision but low recall ...
expand
Self-taught hashing for fast similarity search
Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu
Pages: 18-25
doi>10.1145/1835449.1835455
Full text: PDFPDF

The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of ...
expand
SESSION: User models
Ian Ruthven
Personalizing information retrieval for multi-session tasks: the roles of task stage and task type
Jingjing Liu, Nicholas J. Belkin
Pages: 26-33
doi>10.1145/1835449.1835457
Full text: PDFPDF

Dwell time as a user behavior has been found in previous studies to be an unreliable predictor of document usefulness, with contextual factors such as the user's task needing to be considered in its interpretation. Task stage has been shown to influence ...
expand
Predicting searcher frustration
Henry A. Feild, James Allan, Rosie Jones
Pages: 34-41
doi>10.1145/1835449.1835458
Full text: PDFPDF

When search engine users have trouble finding information, they may become frustrated, possibly resulting in a bad experience (even if they are ultimately successful). In a user study in which participants were given difficult information seeking tasks, ...
expand
The good, the bad, and the random: an eye-tracking study of ad quality in web search
Georg Buscher, Susan T. Dumais, Edward Cutrell
Pages: 42-49
doi>10.1145/1835449.1835459
Full text: PDFPDF

We investigate how people interact with Web search engine result pages using eye-tracking. While previous research has focused on the visual attention devoted to the 10 organic search results, this paper examines other components of contemporary search ...
expand
SESSION: Applications I
Luo Si
Ranking using multiple document types in desktop search
Jinyoung Kim, W. Bruce Croft
Pages: 50-57
doi>10.1145/1835449.1835461
Full text: PDFPDF

A typical desktop environment contains many document types (email, presentations, web pages, pdfs, etc.) each with different metadata. Predicting which types of documents a user is looking for in the context of a given query is a crucial part of providing ...
expand
Acquisition of instance attributes via labeled and related instances
Enrique Alfonseca, Marius Pasca, Enrique Robledo-Arnuncio
Pages: 58-65
doi>10.1145/1835449.1835462
Full text: PDFPDF

This paper presents a method for increasing the quality of automatically extracted instance attributes by exploiting weakly-supervised and unsupervised instance relatedness data. This data consists of (a) class labels for instances and (b) distributional ...
expand
Relevance and ranking in online dating systems
Fernando Diaz, Donald Metzler, Sihem Amer-Yahia
Pages: 66-73
doi>10.1145/1835449.1835463
Full text: PDFPDF

Match-making systems refer to systems where users want to meet other individuals to satisfy some underlying need. Examples of match-making systems include dating services, resume/job bulletin boards, community based question answering, and consumer-to-consumer ...
expand
SESSION: Search engine architectures and scalability
Alistair Moffat
Scalability of findability: effective and efficient IR operations in large information networks
Weimao Ke, Javed Mostafa
Pages: 74-81
doi>10.1145/1835449.1835465
Full text: PDFPDF

It is crucial to study basic principles that support adaptive and scalable retrieval functions in large networked environments such as the Web, where information is distributed among dynamic systems. We conducted experiments on decentralized IR operations ...
expand
Caching search engine results over incremental indices
Roi Blanco, Edward Bortnikov, Flavio Junqueira, Ronny Lempel, Luca Telloli, Hugo Zaragoza
Pages: 82-89
doi>10.1145/1835449.1835466
Full text: PDFPDF

A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. ...
expand
Query forwarding in geographically distributed search engines
B. Barla Cambazoglu, Emre Varol, Enver Kayaaslan, Cevdet Aykanat, Ricardo Baeza-Yates
Pages: 90-97
doi>10.1145/1835449.1835467
Full text: PDFPDF

Query forwarding is an important technique for preserving the result quality in distributed search engines where the index is geographically partitioned over multiple search sites. The key component in query forwarding is the thresholding algorithm by ...
expand
A joint probabilistic classification model for resource selection
Dzung Hong, Luo Si, Paul Bracke, Michael Witt, Tim Juchcinski
Pages: 98-105
doi>10.1145/1835449.1835468
Full text: PDFPDF

Resource selection is an important task in Federated Search to select a small number of most relevant information sources. Current resource selection algorithms such as GlOSS, CORI, ReDDE, Geometric Average and the recent classification-based method ...
expand
SESSION: Link analysis & advertising
Tie-Yan Liu
Temporal click model for sponsored search
Wanhong Xu, Eren Manavoglu, Erick Cantu-Paz
Pages: 106-113
doi>10.1145/1835449.1835470
Full text: PDFPDF

Previous studies on search engine click modeling have identified two presentation factors that affect users' behavior: (1) position bias: the same result will get a different number of clicks when displayed in different positions and (2) externalities: ...
expand
Freshness matters: in flowers, food, and web authority
Na Dai, Brian D. Davison
Pages: 114-121
doi>10.1145/1835449.1835471
Full text: PDFPDF

The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking ...
expand
The importance of anchor text for ad hoc search revisited
Marijn Koolen, Jaap Kamps
Pages: 122-129
doi>10.1145/1835449.1835472
Full text: PDFPDF

It is generally believed that propagated anchor text is very important for effective Web search as offered by the commercial search engines. "Google Bombs" are a notable illustration of this. However, many years of TREC Web retrieval research failed ...
expand
Ready to buy or just browsing?: detecting web searcher goals from interaction data
Qi Guo, Eugene Agichtein
Pages: 130-137
doi>10.1145/1835449.1835473
Full text: PDFPDF

An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side ...
expand
SESSION: Learning to rank
Hang Li
Learning to efficiently rank
Lidan Wang, Jimmy Lin, Donald Metzler
Pages: 138-145
doi>10.1145/1835449.1835475
Full text: PDFPDF

It has been shown that learning to rank approaches are capable of learning highly effective ranking functions. However, these approaches have mostly ignored the important issue of efficiency. Given that both efficiency and effectiveness are important ...
expand
Ranking for the conversion funnel
Abraham Bagherjeiran, Andrew O. Hatch, Adwait Ratnaparkhi
Pages: 146-153
doi>10.1145/1835449.1835476
Full text: PDFPDF

In contextual advertising advertisers show ads to users so that they will click on them and eventually purchase a product. Optimizing this action sequence, called the conversion funnel, is the ultimate goal of advertising. Advertisers, however, often ...
expand
How good is a span of terms?: exploiting proximity to improve web retrieval
Krysta M. Svore, Pallika H. Kanani, Nazan Khan
Pages: 154-161
doi>10.1145/1835449.1835477
Full text: PDFPDF

Ranking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking features ...
expand
Learning to rank only using training data from related domain
Wei Gao, Peng Cai, Kam-Fai Wong, Aoying Zhou
Pages: 162-169
doi>10.1145/1835449.1835478
Full text: PDFPDF

Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose ...
expand
SESSION: Clustering II
Omar Alonso
Optimal meta search results clustering
Claudio Carpineto, Giovanni Romano
Pages: 170-177
doi>10.1145/1835449.1835480
Full text: PDFPDF

By analogy with merging documents rankings, the outputs from multiple search results clustering algorithms can be combined into a single output. In this paper we study the feasibility of meta search results clustering, which has unique features compared ...
expand
Analysis of structural relationships for hierarchical cluster labeling
Markus Muhr, Roman Kern, Michael Granitzer
Pages: 178-185
doi>10.1145/1835449.1835481
Full text: PDFPDF

Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and ...
expand
On the existence of obstinate results in vector space models
Milos Radovanović, Alexandros Nanopoulos, Mirjana Ivanović
Pages: 186-193
doi>10.1145/1835449.1835482
Full text: PDFPDF

The vector space model (VSM) is a popular and widely applied model in information retrieval (IR). VSM creates vector spaces whose dimensionality is usually high (e.g., tens of thousands of terms). This may cause various problems, such as susceptibility ...
expand
SESSION: Filtering and recommendation
Douglas W. Oard
Social media recommendation based on people and tags
Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, Erel Uziel
Pages: 194-201
doi>10.1145/1835449.1835484
Full text: PDFPDF

We study personalized item recommendation within an enterprise social media application suite that includes blogs, bookmarks, communities, wikis, and shared files. Recommendations are based on two of the core elements of social media - people and tags. ...
expand
A network-based model for high-dimensional information filtering
Nikolaos Nanas, Manolis Vavalis, Anne De Roeck
Pages: 202-209
doi>10.1145/1835449.1835485
Full text: PDFPDF

The Vector Space Model has been and to a great extent still is the de facto choice for profile representation in content-based Information Filtering. However, user profiles represented as weighted keyword vectors have inherent dimensionality problems. ...
expand
Temporal diversity in recommender systems
Neal Lathia, Stephen Hailes, Licia Capra, Xavier Amatriain
Pages: 210-217
doi>10.1145/1835449.1835486
Full text: PDFPDF

Collaborative Filtering (CF) algorithms, used to build web-based recommender systems, are often evaluated in terms of how accurately they predict user ratings. However, current evaluation techniques disregard the fact that users continue to rate ...
expand
Serendipitous recommendations via innovators
Noriaki Kawamae
Pages: 218-225
doi>10.1145/1835449.1835487
Full text: PDFPDF

To realize services that provide serendipity, this paper assesses the surprise of each user when presented recommendations. We propose a recommendation algorithm that focuses on the search time that, in the absence of any recommendation, each user would ...
expand
SESSION: Information retrieval theory
Iadh Ounis
On statistical analysis and optimization of information retrieval effectiveness metrics
Jun Wang, Jianhan Zhu
Pages: 226-233
doi>10.1145/1835449.1835489
Full text: PDFPDF

This paper presents a new way of thinking for IR metric optimization. It is argued that the optimal ranking problem should be factorized into two distinct yet interrelated stages: the relevance prediction stage and ranking decision stage. During retrieval ...
expand
Information-based models for ad hoc IR
Stéphane Clinchant, Eric Gaussier
Pages: 234-241
doi>10.1145/1835449.1835490
Full text: PDFPDF

We introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the ...
expand
Score distribution models: assumptions, intuition, and robustness to score manipulation
Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Javed A. Aslam
Pages: 242-249
doi>10.1145/1835449.1835491
Full text: PDFPDF

Inferring the score distribution of relevant and non-relevant documents is an essential task for many IR applications (e.g. information filtering, recall-oriented IR, meta-search, distributed IR). Modeling score distributions in an accurate manner is ...
expand
Refactoring the search problem
Gary William Flake
Pages: 250-250
doi>10.1145/1835449.1835451
Full text: PDFPDF

The most common way of framing the search problem is as an exchange between a user and a database, where the user issues queries and the database replies with results that satisfy constraints imposed by the query but that also optimize some notion of ...
expand
SESSION: Language models & IR theory
Geometric representations for multiple documents
Jangwon Seo, W. Bruce Croft
Pages: 251-258
doi>10.1145/1835449.1835493
Full text: PDFPDF

Combining multiple documents to represent an information object is well-known as an effective approach for many Information Retrieval tasks. For example, passages can be combined to represent a document for retrieval, document clusters are represented ...
expand
Using statistical decision theory and relevance models for query-performance prediction
Anna Shtok, Oren Kurland, David Carmel
Pages: 259-266
doi>10.1145/1835449.1835494
Full text: PDFPDF

We present a novel framework for the query-performance prediction task. That is, estimating the effectiveness of a search performed in response to a query in lack of relevance judgments. Our approach is based on using statistical decision theory ...
expand
Active learning for ranking through expected loss optimization
Bo Long, Olivier Chapelle, Ya Zhang, Yi Chang, Zhaohui Zheng, Belle Tseng
Pages: 267-274
doi>10.1145/1835449.1835495
Full text: PDFPDF

Learning to rank arises in many information retrieval applications, ranging from Web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples ...
expand
SESSION: Query representations & reformulations
Maarten de Rijke
Image search by concept map
Hao Xu, Jingdong Wang, Xian-Sheng Hua, Shipeng Li
Pages: 275-282
doi>10.1145/1835449.1835497
Full text: PDFPDF

In this paper, we present a novel image search system, image search by concept map. This system enables users to indicate not only what semantic concepts are expected to appear but also how these concepts are spatially distributed in the desired ...
expand
Generalized syntactic and semantic models of query reformulation
Amac Herdagdelen, Massimiliano Ciaramita, Daniel Mahler, Maria Holmqvist, Keith Hall, Stefan Riezler, Enrique Alfonseca
Pages: 283-290
doi>10.1145/1835449.1835498
Full text: PDFPDF

We present a novel approach to query reformulation which combines syntactic and semantic information by means of generalized Levenshtein distance algorithms where the substitution operation costs are based on probabilistic term rewrite functions. We ...
expand
Evaluating verbose query processing techniques
Samuel Huston, W. Bruce Croft
Pages: 291-298
doi>10.1145/1835449.1835499
Full text: PDFPDF

Verbose or long queries are a small but significant part of the query stream in web search, and are common in other applications such as collaborative question answering (CQA). Current search engines perform well with keyword queries but are not, in ...
expand
SESSION: Automatic classification
Eric Gaussier
SED: supervised experimental design and its application to text classification
Yi Zhen, Dit-Yan Yeung
Pages: 299-306
doi>10.1145/1835449.1835501
Full text: PDFPDF

In recent years, active learning methods based on experimental design achieve state-of-the-art performance in text classification applications. Although these methods can exploit the distribution of unlabeled data and support batch selection, they cannot ...
expand
Temporally-aware algorithms for document classification
Thiago Salles, Leonardo Rocha, Gisele L. Pappa, Fernando Mourão, Wagner Meira, Jr., Marcos Gonçalves
Pages: 307-314
doi>10.1145/1835449.1835502
Full text: PDFPDF

Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to ...
expand
Multilabel classification with meta-level features
Siddharth Gopal, Yiming Yang
Pages: 315-322
doi>10.1145/1835449.1835503
Full text: PDFPDF

Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have been focused on learning-to-map from instances ...
expand
SESSION: Retrieval models and ranking
Djoerd Hiemstra
Estimation of statistical translation models based on mutual information for ad hoc information retrieval
Maryam Karimzadehgan, ChengXiang Zhai
Pages: 323-330
doi>10.1145/1835449.1835505
Full text: PDFPDF

As a principled approach to capturing semantic relations of words in information retrieval, statistical translation models have been shown to outperform simple document language models which rely on exact matching of words in the query and documents. ...
expand
DivQ: diversification for keyword search over structured databases
Elena Demidova, Peter Fankhauser, Xuan Zhou, Wolfgang Nejdl
Pages: 331-338
doi>10.1145/1835449.1835506
Full text: PDFPDF

Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance ...
expand
Finding support sentences for entities
Roi Blanco, Hugo Zaragoza
Pages: 339-346
doi>10.1145/1835449.1835507
Full text: PDFPDF

We study the problem of finding sentences that explain the relationship between a named entity and an ad-hoc query, which we refer to as entity support sentences. This is an important sub-problem of entity ranking which, to the best of our knowledge, ...
expand
Estimating probabilities for effective data fusion
David Lillis, Lusheng Zhang, Fergus Toolan, Rem W. Collier, David Leonard, John Dunnion
Pages: 347-354
doi>10.1145/1835449.1835508
Full text: PDFPDF

Data Fusion is the combination of a number of independent search results, relating to the same document collection, into a single result to be presented to the user. A number of probabilistic data fusion models have been shown to be effective in empirical ...
expand
SESSION: User feedback & user models
Nicholas J. Belkin
Incorporating post-click behaviors into a click model
Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, Haixun Wang
Pages: 355-362
doi>10.1145/1835449.1835510
Full text: PDFPDF

Much work has attempted to model a user's click-through behavior by mining the click logs. The task is not trivial due to the well-known position bias problem. Some break-throughs have been made: two newly proposed click models, DBN and CCM, addressed ...
expand
Interactive retrieval based on faceted feedback
Lanbo Zhang, Yi Zhang
Pages: 363-370
doi>10.1145/1835449.1835511
Full text: PDFPDF

Motivated by the commonly used faceted search interface in e-commerce, this paper investigates interactive relevance feedback mechanism based on faceted document metadata. In this mechanism, the system recommends a group of document facet-value pairs, ...
expand
A comparison of general vs personalised affective models for the prediction of topical relevance
Ioannis Arapakis, Konstantinos Athanasakos, Joemon M. Jose
Pages: 371-378
doi>10.1145/1835449.1835512
Full text: PDFPDF

Information retrieval systems face a number of challenges, originating mainly from the semantic gap problem. Implicit feedback techniques have been employed in the past to address many of these issues. Although this was a step towards the right direction, ...
expand
Understanding web browsing behaviors through Weibull analysis of dwell time
Chao Liu, Ryen W. White, Susan Dumais
Pages: 379-386
doi>10.1145/1835449.1835513
Full text: PDFPDF

Dwell time on Web pages has been extensively used for various information retrieval tasks. However, some basic yet important questions have not been sufficiently addressed, eg, what distribution is appropriate to model the distribution of dwell ...
expand
SESSION: Web IR and social media search
Hugo Zaragoza
Segmentation of multi-sentence questions: towards effective question retrieval in cQA services
Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua
Pages: 387-394
doi>10.1145/1835449.1835515
Full text: PDFPDF

Existing question retrieval models work relatively well in finding similar questions in community-based question answering (cQA) services. However, they are designed for single-sentence queries or bag-of-word representations, and are not sufficient to ...
expand
Mining the blogosphere for top news stories identification
Yeha Lee, Hun-young Jung, Woosang Song, Jong-Hyeok Lee
Pages: 395-402
doi>10.1145/1835449.1835516
Full text: PDFPDF

The analysis of query logs from blog search engines show that news-related queries occupy a significant portion of the logs. This raises a interesting research question on whether the blogosphere can be used to identify important news stories. In this ...
expand
Proximity-based opinion retrieval
Shima Gerani, Mark James Carman, Fabio Crestani
Pages: 403-410
doi>10.1145/1835449.1835517
Full text: PDFPDF

Blog post opinion retrieval aims at finding blog posts that are relevant and opinionated about a user's query. In this paper we propose a simple probabilistic model for assigning relevant opinion scores to documents. The key problem is how to capture ...
expand
Evaluating and predicting answer quality in community QA
Chirag Shah, Jefferey Pomerantz
Pages: 411-418
doi>10.1145/1835449.1835518
Full text: PDFPDF

Question answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
expand
SESSION: Document structure & adversarial information retrieval
Mounia Lalmas
Adaptive near-duplicate detection via similarity learning
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Pages: 419-426
doi>10.1145/1835449.1835520
Full text: PDFPDF

In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued sparse k-gram vector, where the weights are learned to optimize for ...
expand
A content based approach for discovering missing anchor text for web search
Xing Yi, James Allan
Pages: 427-434
doi>10.1145/1835449.1835521
Full text: PDFPDF

Although anchor text provides very useful information for web search, a large portion of web pages have few or no incoming hyperlinks (anchors), which is known as the anchor text sparsity problem. In this paper, we propose a language modeling based technique ...
expand
Uncovering social spammers: social honeypots + machine learning
Kyumin Lee, James Caverlee, Steve Webb
Pages: 435-442
doi>10.1145/1835449.1835522
Full text: PDFPDF

Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. ...
expand
SESSION: Users and interactive IR
David Carmel
Studying trailfinding algorithms for enhanced web search
Adish Singla, Ryen White, Jeff Huang
Pages: 443-450
doi>10.1145/1835449.1835524
Full text: PDFPDF

Search engines return ranked lists of Web pages in response to queries. These pages are starting points for post-query navigation, but may be insufficient for search tasks involving multiple steps. Search trails mined from toolbar logs start with a query ...
expand
Context-aware ranking in web search
Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li
Pages: 451-458
doi>10.1145/1835449.1835525
Full text: PDFPDF

The context of a search query often provides a search engine meaningful hints for answering the current query better. Previous studies on context-aware search were either focused on the development of context models or limited to a relatively small scale ...
expand
Collecting high quality overlapping labels at low cost
Hui Yang, Anton Mityagin, Krysta M. Svore, Sergey Markov
Pages: 459-466
doi>10.1145/1835449.1835526
Full text: PDFPDF

This paper studies quality of human labels used to train search engines' rankers. Our specific focus is performance improvements obtained by using overlapping relevance labels, which is by collecting multiple human judgments for each training sample. ...
expand
SESSION: Document representation and content analysis
Marie-Francine Moens
Multi-style language model for web scale information retrieval
Kuansan Wang, Xiaolong Li, Jianfeng Gao
Pages: 467-474
doi>10.1145/1835449.1835528
Full text: PDFPDF

Web documents are typically associated with many text streams, including the body, the title and the URL that are determined by the authors, and the anchor text or search queries used by others to refer to the documents. Through a systematic large scale ...
expand
Combining coregularization and consensus-based self-training for multilingual text categorization
Massih R. Amini, Cyril Goutte, Nicolas Usunier
Pages: 475-482
doi>10.1145/1835449.1835529
Full text: PDFPDF

We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the framework of multiview learning, where different languages correspond to ...
expand
Towards subjectifying text clustering
Sajib Dasgupta, Vincent Ng
Pages: 483-490
doi>10.1145/1835449.1835530
Full text: PDFPDF

Although it is common practice to produce only a single clustering of a dataset, in many cases text documents can be clustered along different dimensions. Unfortunately, not only do traditional text clustering algorithms fail to produce multiple clusterings ...
expand
SESSION: Summarization & user feedback
Elizabeth D. Liddy
EUSUM: extracting easy-to-understand english summaries for non-native readers
Xiaojun Wan, Huiying Li, Jianguo Xiao
Pages: 491-498
doi>10.1145/1835449.1835532
Full text: PDFPDF

In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-to-understand English summary for non-native readers. Existing summarization systems extract the same kind of English summaries from ...
expand
Visual summarization of web pages
Binxing Jiao, Linjun Yang, Jizheng Xu, Feng Wu
Pages: 499-506
doi>10.1145/1835449.1835533
Full text: PDFPDF

Visual summarization is a attractive new scheme to summarize web pages, which can help achieve a more friendly user experience in search and re-finding tasks by allowing users quickly get the idea of what the web page is about and helping users recall ...
expand
Learning more powerful test statistics for click-based retrieval evaluation
Yisong Yue, Yue Gao, Oliver Chapelle, Ya Zhang, Thorsten Joachims
Pages: 507-514
doi>10.1145/1835449.1835534
Full text: PDFPDF

Interleaving experiments are an attractive methodology for evaluating retrieval functions through implicit feedback. Designed as a blind and unbiased test for eliciting a preference between two retrieval functions, an interleaved ranking of the results ...
expand
SESSION: Query log analysis
Yoelle Maarek
Query similarity by projecting the query-flow graph
Ilaria Bordino, Carlos Castillo, Debora Donato, Aristides Gionis
Pages: 515-522
doi>10.1145/1835449.1835536
Full text: PDFPDF

Defining a measure of similarity between queries is an interesting and difficult problem. A reliable query-similarity measure can be used in a variety of applications such as query recommendation, query expansion, and advertising. In this paper, we exploit ...
expand
The demographics of web search
Ingmar Weber, Carlos Castillo
Pages: 523-530
doi>10.1145/1835449.1835537
Full text: PDFPDF

How does the web search behavior of "rich" and "poor" people differ? Do men and women tend to click on difffferent results for the same query? What are some queries almost exclusively issued by African Americans? These are some of the questions we address ...
expand
A user behavior model for average precision and its generalization to graded judgments
Georges Dupret, Benjamin Piwowarski
Pages: 531-538
doi>10.1145/1835449.1835538
Full text: PDFPDF

We explore a set of hypothesis on user behavior that are potentially at the origin of the (Mean) Average Precision (AP) metric. This allows us to propose a more realistic version of AP where users click non-deterministically on relevant documents and ...
expand
SESSION: Test-collections
John Tait
The effect of assessor error on IR system evaluation
Ben Carterette, Ian Soboroff
Pages: 539-546
doi>10.1145/1835449.1835540
Full text: PDFPDF

Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the ...
expand
Reusable test collections through experimental design
Ben Carterette, Evangelos Kanoulas, Virgil Pavlu, Hui Fang
Pages: 547-554
doi>10.1145/1835449.1835541
Full text: PDFPDF

Portable, reusable test collections are a vital part of research and development in information retrieval. Reusability is difficult to assess, however. The standard approach--simulating judgment collection when groups of systems are held out, then evaluating ...
expand
Do user preferences and evaluation measures line up?
Mark Sanderson, Monica Lestari Paramita, Paul Clough, Evangelos Kanoulas
Pages: 555-562
doi>10.1145/1835449.1835542
Full text: PDFPDF

This paper presents results comparing user preference for search engine rankings with measures of effectiveness computed from a test collection. It establishes that preferences and evaluation measures correlate: systems measured as better on a test collection ...
expand
SESSION: Query analysis
Ricardo Baeza-Yates
Estimating advertisability of tail queries for sponsored search
Sandeep Pandey, Kunal Punera, Marcus Fontoura, Vanja Josifovski
Pages: 563-570
doi>10.1145/1835449.1835544
Full text: PDFPDF

Sponsored search is one of the major sources of revenue for search engines on the World Wide Web. It has been observed that while showing ads for every query maximizes short-term revenue, irrelevant ads lead to poor user experience and less revenue in ...
expand
Exploring reductions for long web queries
Niranjan Balasubramanian, Giridhar Kumaran, Vitor R. Carvalho
Pages: 571-578
doi>10.1145/1835449.1835545
Full text: PDFPDF

Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has ...
expand
Positional relevance model for pseudo-relevance feedback
Yuanhua Lv, ChengXiang Zhai
Pages: 579-586
doi>10.1145/1835449.1835546
Full text: PDFPDF

Pseudo-relevance feedback is an effective technique for improving retrieval results. Traditional feedback algorithms use a whole feedback document as a unit to extract words for query expansion, which is not optimal as a document may cover several different ...
expand
SESSION: Effectiveness measures
Ian Soboroff
Assessing the scenic route: measuring the value of search trails in web logs
Ryen W. White, Jeff Huang
Pages: 587-594
doi>10.1145/1835449.1835548
Full text: PDFPDF

Search trails mined from browser or toolbar logs comprise queries and the post-query pages that users visit. Implicit endorsements from many trails can be useful for search result ranking, where the presence of a page on a trail increases its query relevance. ...
expand
Human performance and retrieval precision revisited
Mark D. Smucker, Chandra Prakash Jethani
Pages: 595-602
doi>10.1145/1835449.1835549
Full text: PDFPDF

Several studies have found that the Cranfield approach to evaluation can report significant performance differences between retrieval systems for which little to no performance difference is found for humans completing tasks with these systems. We revisit ...
expand
Extending average precision to graded relevance judgments
Stephen E. Robertson, Evangelos Kanoulas, Emine Yilmaz
Pages: 603-610
doi>10.1145/1835449.1835550
Full text: PDFPDF

Evaluation metrics play a critical role both in the context of comparative evaluation of the performance of retrieval systems and in the context of learning-to-rank (LTR) as objective functions to be optimized. Many different evaluation metrics have ...
expand
PRES: a score metric for evaluating recall-oriented information retrieval applications
Walid Magdy, Gareth J.F. Jones
Pages: 611-618
doi>10.1145/1835449.1835551
Full text: PDFPDF

Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused ...
expand
SESSION: Multimedia information retrieval
Tat Seng Chua
Content-enriched classifier for web video classification
Bin Cui, Ce Zhang, Gao Cong
Pages: 619-626
doi>10.1145/1835449.1835553
Full text: PDFPDF

With the explosive growth of online videos, automatic real-time categorization of Web videos plays a key role for organizing, browsing and retrieving the huge amount of videos on the Web. Previous work shows that, in addition to text features, content ...
expand
Robust audio identification for MP3 popular music
Wei Li, Yaduo Liu, Xiangyang Xue
Pages: 627-634
doi>10.1145/1835449.1835554
Full text: PDFPDF

Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work ...
expand
Effective music tagging through advanced statistical modeling
Jialie Shen, Wang Meng, Shuichang Yan, HweeHwa Pang, Xiansheng Hua
Pages: 635-642
doi>10.1145/1835449.1835555
Full text: PDFPDF

Music information retrieval (MIR) holds great promise as a technology for managing large music archives. One of the key components of MIR that has been actively researched into is music tagging. While significant progress has been achieved, most of the ...
expand
Properties of optimally weighted data fusion in CBMIR
Peter Wilkins, Alan F. Smeaton, Paul Ferguson
Pages: 643-650
doi>10.1145/1835449.1835556
Full text: PDFPDF

Content-Based Multimedia Information Retrieval (CBMIR) systems which leverage multiple retrieval experts (En) often employ a weighting scheme when combining expert results through data fusion. Typically however a query will comprise ...
expand
SESSION: Non-english IR & evaluation
Jaana Kekäläinen
To translate or not to translate?
Chia-Jung Lee, Chin-Hui Chen, Shao-Hang Kao, Pu-Jen Cheng
Pages: 651-658
doi>10.1145/1835449.1835558
Full text: PDFPDF

Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ ...
expand
Multilingual PRF: english lends a helping hand
Manoj K. Chinnakotla, Karthik Raman, Pushpak Bhattacharyya
Pages: 659-666
doi>10.1145/1835449.1835559
Full text: PDFPDF

In this paper, we present a novel approach to Pseudo-Relevance Feedback (PRF) called Multilingual PRF (MultiPRF). The key idea is to harness multilinguality. Given a query in a language, we take the help of another language to ameliorate the well known ...
expand
Comparing the sensitivity of information retrieval metrics
Filip Radlinski, Nick Craswell
Pages: 667-674
doi>10.1145/1835449.1835560
Full text: PDFPDF

Information retrieval effectiveness is usually evaluated using measures such as Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP) and Precision at some cutoff (Precision@k) on a set of judged queries. Recent research has suggested ...
expand
SESSION: Applications II
David D. Lewis
Efficient partial-duplicate detection based on sequence matching
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
Pages: 675-682
doi>10.1145/1835449.1835562
Full text: PDFPDF

With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since partial-duplicates only contain a small piece of text taken from other sources ...
expand
Discriminative models of integrating document evidence and document-candidate associations for expert search
Yi Fang, Luo Si, Aditya P. Mathur
Pages: 683-690
doi>10.1145/1835449.1835563
Full text: PDFPDF

Generative models such as statistical language modeling have been widely studied in the task of expert search to model the relationship between experts and their expertise indicated in supporting documents. On the other hand, discriminative models have ...
expand
Vertical selection in the presence of unlabeled verticals
Jaime Arguello, Fernando Diaz, Jean-François Paiement
Pages: 691-698
doi>10.1145/1835449.1835564
Full text: PDFPDF

Vertical aggregation is the task of incorporating results from specialized search engines or verticals (e.g., images, video, news) into Web search results. Vertical selection is the subtask of deciding, given a query, which verticals, if any, are relevant. ...
expand
DEMONSTRATION SESSION: Demonstrations
iCollaborate: harvesting value from enterprise web usage
Ajinkya Kale, Thomas Burris, Bhavesh Shah, T L Prasanna Venkatesan, Lakshmanan Velusamy, Manish Gupta, Melania Degerattu
Pages: 699-699
doi>10.1145/1835449.1835566
Full text: PDFPDF

We are in a phase of 'Participatory Web' in which users add value' to the information on the web by publishing, tagging and sharing. The Participatory Web has enormous potential for an enterprise because unlike the users of the internet an enterprise ...
expand
Exploring desktop resources based on user activity analysis
Yukun Li, Xiangyu Zhang, Xiaofeng Meng
Pages: 700-700
doi>10.1145/1835449.1835567
Full text: PDFPDF

Relocation in personal desktop resources is an interesting and promising research topic. This demonstration illustrates a new perspective in exploring desktop resources to help users re-find expected data resources more effectively. Different from existing ...
expand
A data-parallel toolkit for information retrieval
Dennis Fetterly, Frank McSherry
Pages: 701-701
doi>10.1145/1835449.1835568
Full text: PDFPDF
Finding and filtering information for children
Desmond Elliot, Richard Glassey, Tamara Polajnar, Leif Azzopardi
Pages: 702-702
doi>10.1145/1835449.1835569
Full text: PDFPDF

Children face several challenges when using information access systems. These include formulating queries, judging the relevance of documents, and focusing attention on interface cues, such as query suggestions, while typing queries. It has also been ...
expand
Automatic content linking: speech-based just-in-time retrieval for multimedia archives
Andrei Popescu-Belis, Jonathan Kilgour, Peter Poller, Alexandre Nanchen, Erik Boertjes, Joost de Wit
Pages: 703-703
doi>10.1145/1835449.1835570
Full text: PDFPDF

The Automatic Content Linking Device monitors a conversation and uses automatically recognized words to retrieve documents that are of potential use to the participants. The document set includes project related reports or emails, transcribed snippets ...
expand
Si-Fi: interactive similar item finder
Inbeom Hwang, Minsuk Kahng, Sung Eun Park, Jinwook Seo, Sang-goo Lee
Pages: 704-704
doi>10.1145/1835449.1835571
Full text: PDFPDF
Suggesting related topics in web search
Santosh Raju, Shaishav Kumar, Raghavendra Udupa
Pages: 705-705
doi>10.1145/1835449.1835572
Full text: PDFPDF

Suggesting topics that are related to user's goal or interest is very important in web search. However, search engines today focus on suggesting mainly reformulations and lexical variants of the query mined from query logs. In this demonstration, we ...
expand
Agro-Gator: digesting experts, logs, and N-grams
Michael Huggett
Pages: 706-706
doi>10.1145/1835449.1835573
Full text: PDFPDF

As research includes more and larger user studies, a significant problem lies in combining the many types of data files into a single table suitable for analysis by common statistical tools. We have developed a data-aggregation tool that combines user ...
expand
Medical search and classification tools for recommendation
Jimmy Xiangji Huang, Aijun An, Qinmin Hu
Pages: 707-707
doi>10.1145/1835449.1835574
Full text: PDFPDF

their patients' records from paper to computer, enormous amounts of electronic medical records (EMR) have become available for medical research. Some of the EMR data are well-structured, for which traditional database management systems can provide effective ...
expand
Multilingual people search
Shaishav Kumar, Raghavendra Udupa
Pages: 708-708
doi>10.1145/1835449.1835575
Full text: PDFPDF

People Search is an important search service with multiple applications (eg. looking up a friend on Facebook, finding colleagues in corporate email directories etc). With the proportion of non-English users on a steady rise, people search services are ...
expand
POSTER SESSION: Poster presentations
Closed form solution of similarity algorithms
Yuanzhe Cai, Miao Zhang, Chris Ding, Sharma Chakravarthy
Pages: 709-710
doi>10.1145/1835449.1835577
Full text: PDFPDF

Algorithms defining similarities between objects of an information network are important of many IR tasks. SimRank algorithm and its variations are popularly used in many applications. Many fast algorithms are also developed. In this note, we first reformulate ...
expand
Blog snippets: a comments-biased approach
Javier Parapar, Jorge López-Castro, Álvaro Barreiro
Pages: 711-712
doi>10.1145/1835449.1835578
Full text: PDFPDF

In the last years Blog Search has been a new exciting task in Information Retrieval. The presence of user generated information with valuable opinions makes this field of huge interest. In this poster we use part of this information, the readers' comments, ...
expand
SIGIR: scholar vs. scholars' interpretation
James Lanagan, Alan F. Smeaton
Pages: 713-714
doi>10.1145/1835449.1835579
Full text: PDFPDF

Google Scholar allows researchers to search through a free and extensive source of information on scientific publications. In this paper we show that within the limited context of SIGIR proceedings, the rankings created by Google Scholar are both significantly ...
expand
Effective query expansion with the resistance distance based term similarity metric
Shuguang Wang, Milos Hauskrecht
Pages: 715-716
doi>10.1145/1835449.1835580
Full text: PDFPDF

In this paper, we define a new query expansion method that relies on term similarity metric derived from the electric resistance network. This proposed metric lets us measure the mutual relevancy in between terms and between their groups. This paper ...
expand
A method to automatically construct a user knowledge model in a forum environment
Ahmad Kardan, Mehdi Garakani, Bamdad Bahrani
Pages: 717-718
doi>10.1145/1835449.1835581
Full text: PDFPDF

Having a mechanism to validate the opinions and to identify experts in a forum could help people to favor one opinion against another. To achieve this, some solutions have already been introduced, including social network analysis techniques and reputation ...
expand
Learning to rank audience for behavioral targeting
Ning Liu, Jun Yan, Dou Shen, Depin Chen, Zheng Chen, Ying Li
Pages: 719-720
doi>10.1145/1835449.1835582
Full text: PDFPDF

Behavioral Targeting (BT) is a recent trend of online advertising market. However, some classical BT solutions, which predefine the user segments for BT ads delivery, are sometimes too large to numerous long-tail advertisers, who cannot afford to buy ...
expand
Multi-modal query expansion for web video search
Bailan Feng, Juan Cao, Zhineng Chen, Yongdong Zhang, Shouxun Lin
Pages: 721-722
doi>10.1145/1835449.1835583
Full text: PDFPDF

Query expansion is an effective method to improve the usability of multimedia search. Most existing multimedia search engines are able to automatically expand a list of textual query terms based on text search techniques, which can be called textual ...
expand
Context aware query classification using dynamic query window and relationship net
Nazli Goharian, Saket S.R. Mengle
Pages: 723-724
doi>10.1145/1835449.1835584
Full text: PDFPDF

The context of the user queries, preceding a given query, is utilized to improve the effectiveness of query classification. Earlier efforts utilize fixed number of preceding queries to derive such context information. We propose and evaluate an approach ...
expand
Predicting query potential for personalization, classification or regression?
Chen Chen, Muyun Yang, Sheng Li, Tiejun Zhao, Haoliang Qi
Pages: 725-726
doi>10.1145/1835449.1835585
Full text: PDFPDF

The goal of predicting query potential for personalization is to determine which queries can benefit from personalization. In this paper, we investigate which kind of strategy is better for this task: classification or regression. We quantify the potential ...
expand
The impact of collection size on relevance and diversity
Marijn Koolen, Jaap Kamps
Pages: 727-728
doi>10.1145/1835449.1835586
Full text: PDFPDF

It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple documents conveying the same information. Arguably, a user has no interest in ...
expand
Spatial relationships in visual graph modeling for image categorization
Trong-Ton Pham, Philippe Mulhem, Loic Maisonnasse
Pages: 729-730
doi>10.1145/1835449.1835587
Full text: PDFPDF

In this paper, a language model adapted to graph-based representation of image content is proposed and assessed. The full indexing and retrieval processes are evaluated on two different image corpora. We show that using the spatial relationships with ...
expand
A picture is worth a thousand search results: finding child-oriented multimedia results with collAge
Karl Gyllstrom, Marie-Francine Moens
Pages: 731-732
doi>10.1145/1835449.1835588
Full text: PDFPDF

We present a simple and effective approach to complement search results for children's web queries with child-oriented multimedia results, such as coloring pages and music sheets. Our approach determines appropriate media types for a query by searching ...
expand
Query recovery of short user queries: on query expansion with stopwords
Johannes Leveling, Gareth J.F. Jones
Pages: 733-734
doi>10.1145/1835449.1835589
Full text: PDFPDF

User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate ...
expand
Where to start filtering redundancy?: a cluster-based approach
Ronald T. Fernandez, Javier Parapar, David E. Losada, Alvaro Barreiro
Pages: 735-736
doi>10.1145/1835449.1835590
Full text: PDFPDF

Novelty detection is a difficult task, particularly at sentence level. Most of the approaches proposed in the past consist of re-ordering all sentences following their novelty scores. However, this re-ordering has usually little value. In fact, a naive ...
expand
Flickr group recommendation based on tensor decomposition
Nan Zheng, Qiudan Li, Shengcai Liao, Leiming Zhang
Pages: 737-738
doi>10.1145/1835449.1835591
Full text: PDFPDF

Over the last few years, Flickr has gained massive popularity and groups in Flickr are one of the main ways for photo diffusion. However, the huge volume of groups brings troubles for users to decide which group to choose. In this paper, we propose a ...
expand
Robust music identification based on low-order zernike moment in the compressed domain
Wei Li, Yaduo Liu, Xiangyang Xue
Pages: 739-740
doi>10.1145/1835449.1835592
Full text: PDFPDF

In this paper, we devise a novel robust music identification algorithm utilizing compressed-domain audio Zernike moment adapted from image processing techniques as the pivotal feature. Audio fingerprint derived from this feature exhibits strong robustness ...
expand
Estimating interference in the QPRP for subtopic retrieval
Guido Zuccon, Leif Azzopardi, Claudia Hauff, C.J. Keith van Rijsbergen
Pages: 741-742
doi>10.1145/1835449.1835593
Full text: PDFPDF

The Quantum Probability Ranking Principle (QPRP) has been recently proposed, and accounts for interdependent document relevance when ranking. However, to be instantiated, the QPRP requires a method to approximate the "interference" between two documents. ...
expand
Query quality: user ratings and system predictions
Claudia Hauff, Franciska de Jong, Diane Kelly, Leif Azzopardi
Pages: 743-744
doi>10.1145/1835449.1835594
Full text: PDFPDF

Numerous studies have examined the ability of query performance prediction methods to estimate a query's quality for system effectiveness measures (such as average precision). However, little work has explored the relationship between these methods and ...
expand
Multi-field learning for email spam filtering
Wuying Liu, Ting Wang
Pages: 745-746
doi>10.1145/1835449.1835595
Full text: PDFPDF

Through the investigation of email document structure, this paper proposes a multi-field learning (MFL) framework, which breaks the multi-field document Text Classification (TC) problem into several sub-document TC problems, and makes the final category ...
expand
Language-model-based pro/con classification of political text
Rawia Awadallah, Maya Ramanath, Gerhard Weikum
Pages: 747-748
doi>10.1145/1835449.1835596
Full text: PDFPDF

Given a controversial political topic, our aim is to classify documents debating the topic into pro or con. Our approach extracts topic related terms, pro/con related terms, and pairs of topic related and pro/con related terms and uses them as the basis ...
expand
Intent boundary detection in search query logs
Chieh-Jen Wang, Kevin Hsin-Yih Lin, Hsin-Hsi Chen
Pages: 749-750
doi>10.1145/1835449.1835597
Full text: PDFPDF

Identifying intent boundary in search query logs is important for learning users' behaviors and applying their experiences. Time-based, query-based, and cluster-based approaches are proposed. Experiments show that the integration of intent clusters and ...
expand
Semi-supervised spam filtering using aggressive consistency learning
Mona Mojdeh, Gordon V. Cormack
Pages: 751-752
doi>10.1145/1835449.1835598
Full text: PDFPDF

A graph based semi-supervised method for email spam filtering, based on the local and global consistency method, yields low error rates with very few labeled examples. The motivating application of this method is spam filters with access to very few ...
expand
Entropy descriptor for image classification
Hongyu Li, Junyu Niu, Jiachen Chen, Huibo Liu
Pages: 753-754
doi>10.1145/1835449.1835599
Full text: PDFPDF

This paper presents a novel entropy descriptor in the sense of geometric manifolds. With this descriptor, entropy cycles can be easily designed for image classification. Minimizing this entropy leads to an optimal entropy cycle where images are connected ...
expand
Has portfolio theory got any principles?
Guido Zuccon, Leif Azzopardi, C.J. "Keith" van Rijsbergen
Pages: 755-756
doi>10.1145/1835449.1835600
Full text: PDFPDF

Recently, Portfolio Theory (PT) has been proposed for Information Retrieval. However, under non-trivial conditions PT violates the original Probability Ranking Principle (PRP). In this poster, we shall explore whether PT upholds a different ranking principle ...
expand
Re-examination on lam% in spam filtering
Haoliang Qi, Muyun Yang, Xiaoning He, Sheng Li
Pages: 757-758
doi>10.1145/1835449.1835601
Full text: PDFPDF

Logistic average misclassification percentage (lam%) is a key measure for the spam filtering performance. This paper demonstrates that a spam filter can achieve a perfect 0.00% in lam%, the minimal value in theory, by simply setting a biased threshold ...
expand
Unsupervised estimation of dirichlet smoothing parameters
Jangwon Seo, W. Bruce Croft
Pages: 759-760
doi>10.1145/1835449.1835602
Full text: PDFPDF

A standard approach for determining a Dirichlet smoothing parameter is to choose a value which maximizes a retrieval performance metric using training data consisting of queries and relevance judgments. There are, however, situations where training data ...
expand
Comparing click-through data to purchase decisions for retrieval evaluation
Katja Hofmann, Bouke Huurnink, Marc Bron, Maarten de Rijke
Pages: 761-762
doi>10.1145/1835449.1835603
Full text: PDFPDF

Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. Relevance assessments inferred from implicit feedback such as click-through data can be collected inexpensively, but may be less reliable. We compare assessments ...
expand
Personalize web search results with user's location
Yumao Lu, Fuchun Peng, Xing Wei, Benoit Dumoulin
Pages: 763-764
doi>10.1145/1835449.1835604
Full text: PDFPDF

We build a probabilistic model to identify implicit local intent queries, and leverage user's physical location to improve Web search results for these queries. Evaluation on commercial search engine shows significant improvement on search relevance ...
expand
Using search session context for named entity recognition in query
Junwu Du, Zhimin Zhang, Jun Yan, Yan Cui, Zheng Chen
Pages: 765-766
doi>10.1145/1835449.1835605
Full text: PDFPDF

Recently, the problem of Named Entity Recognition in Query (NERQ) is attracting increasingly attention in the field of information retrieval. However, the lack of context information in short queries makes some classical named entity recognition (NER) ...
expand
Evaluating whole-page relevance
Peter Bailey, Nick Craswell, Ryen W. White, Liwei Chen, Ashwin Satyanarayana, S.M.M. Tahaghoghi
Pages: 767-768
doi>10.1145/1835449.1835606
Full text: PDFPDF

Whole page relevance defines how well the surface-level repre-sentation of all elements on a search result page and the corre-sponding holistic attributes of the presentation respond to users' information needs. We introduce a method for evaluating the ...
expand
Predicting escalations of medical queries based on web page structure and content
Ryen W. White, Eric Horvitz
Pages: 769-770
doi>10.1145/1835449.1835607
Full text: PDFPDF

Logs of users' searches on Web health topics can exhibit signs of escalation of medical concerns, where initial queries about common symptoms are followed by queries about serious, rare illnesses. We present an effort to predict such escalations based ...
expand
Contextual video advertising system using scene information inferred from video scripts
Bong-Jun Yi, Jung-Tae Lee, Hyun-Wook Woo, Hae-Chang Rim
Pages: 771-772
doi>10.1145/1835449.1835608
Full text: PDFPDF

With the rise of digital video consumptions, contextual video advertising demands have been increasing in recent years. This paper presents a novel video advertising system that selects relevant text ads for a given video scene by automatically identifying ...
expand
Cross-language retrieval using link-based language models
Benjamin Roth, Dietrich Klakow
Pages: 773-774
doi>10.1145/1835449.1835609
Full text: PDFPDF

We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A ...
expand
Search system requirements of patent analysts
Leif Azzopardi, Wim Vanderbauwhede, Hideo Joho
Pages: 775-776
doi>10.1145/1835449.1835610
Full text: PDFPDF

Patent search tasks are difficult and challenging, often requiring expert patent analysts to spend hours, even days, sourcing relevant information. To aid them in this process, analysts use Information Retrieval systems and tools to cope with their retrieval ...
expand
On performance of topical opinion retrieval
Giambattista Amati, Giuseppe Amodeo, Valerio Capozio, Carlo Gaibisso, Giorgio Gambosi
Pages: 777-778
doi>10.1145/1835449.1835611
Full text: PDFPDF

We investigate the effectiveness of both the standard evaluation measures and the opinion component for topical opinion retrieval. We analyze how relevance is affected by opinions by perturbing relevance ranking by the outcomes of opinion-only classifiers ...
expand
Improving sentence retrieval with an importance prior
Leif Azzopardi, Ronald T. Fernández, David E. Losada
Pages: 779-780
doi>10.1145/1835449.1835612
Full text: PDFPDF

The retrieval of sentences is a core task within Information Retrieval. In this poster we employ a Language Model that incorporates a prior which encodes the importance of sentences within the retrieval model. Then, in a set of comprehensive experiments ...
expand
Focused access to sparsely and densely relevant documents
Paavo Arvola, Jaana Kekäläinen, Marko Junkkari
Pages: 781-782
doi>10.1145/1835449.1835613
Full text: PDFPDF

XML retrieval provides a focused access to the relevant content of documents. However, in evaluation, full document retrieval has appeared competitive to focused XML retrieval. We analyze the density of relevance in documents, and show that in sparsely ...
expand
Text document clustering with metric learning
Jinlong Wang, Shunyao Wu, Huy Quan Vu, Gang Li
Pages: 783-784
doi>10.1145/1835449.1835614
Full text: PDFPDF

One reason for semi-supervised clustering fail to deliver satisfactory performance in document clustering is that the transformed optimization problem could have many candidate solutions, but existing methods provide no mechanism to select a suitable ...
expand
Predicting query performance on the web
Niranjan Balasubramanian, Giridhar Kumaran, Vitor R. Carvalho
Pages: 785-786
doi>10.1145/1835449.1835615
Full text: PDFPDF

Predicting the performance of web queries is useful for several applications such as automatic query reformulation and automatic spell correction. In the web environment, accurate performance prediction is challenging because measures such as clarity ...
expand
Hashtag retrieval in a microblogging environment
Miles Efron
Pages: 787-788
doi>10.1145/1835449.1835616
Full text: PDFPDF

Microblog services let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers of a post's meaning, audience, etc. This poster treats the following problem: given a user's ...
expand
Crowdsourcing a wikipedia vandalism corpus
Martin Potthast
Pages: 789-790
doi>10.1145/1835449.1835617
Full text: PDFPDF

We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32452 edits on 28468 Wikipedia articles, among which 2391 vandalism edits have been identified. 753 human annotators ...
expand
MEMOSE: search engine for emotions in multimedia documents
Kathrin Knautz, Tobias Siebenlist, Wolfgang G. Stock
Pages: 791-792
doi>10.1145/1835449.1835618
Full text: PDFPDF

The MEMOSE (Media Emotion Search) system is a specialized search engine for fundamental emotions in all kinds of emotional-laden documents. We apply a controlled vocabulary for basic emotions, a slide control to adjust the intensities of the emotions ...
expand
Hierarchical pitman-yor language model for information retrieval
Saeedeh Momtazi, Dietrich Klakow
Pages: 793-794
doi>10.1145/1835449.1835619
Full text: PDFPDF

In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which ...
expand
Entity summarization of news articles
Gianluca Demartini, Malik Muhammad Saad Missen, Roi Blanco, Hugo Zaragoza
Pages: 795-796
doi>10.1145/1835449.1835620
Full text: PDFPDF

In this paper we study the problem of entity retrieval for news applications and the importance of the news trail history (i.e. past related articles) to determine the relevant entities in current articles. We construct a novel entity-labeled corpus ...
expand
The power of naive query segmentation
Matthias Hagen, Martin Potthast, Benno Stein, Christof Braeutigam
Pages: 797-798
doi>10.1145/1835449.1835621
Full text: PDFPDF

We address the problem of query segmentation: given a keyword query submitted to a search engine, the task is to group the keywords into phrases, if possible. Previous approaches to the problem achieve good segmentation performance on a gold standard ...
expand
Clicked phrase document expansion for sponsored search ad retrieval
Dustin Hillard, Chris Leggetter
Pages: 799-800
doi>10.1145/1835449.1835622
Full text: PDFPDF

We present a document expansion approach that uses Conditional Random Field (CRF) segmentation to automatically extract salient phrases from ad titles. We then supplement the ad document with query segments that are probable translations of the document ...
expand
Three web-based heuristics to determine a person's or institution's country of origin
Markus Schedl, Klaus Seyerlehner, Dominik Schnitzer, Gerhard Widmer, Cornelia Schiketanz
Pages: 801-802
doi>10.1145/1835449.1835623
Full text: PDFPDF

We propose three heuristics to determine the country of origin of a person or institution via text-based IE from the Web. We evaluate all methods on a collection of music artists and bands, and show that some heuristics outperform earlier work on the ...
expand
Exploiting click-through data for entity retrieval
Bodo Billerbeck, Gianluca Demartini, Claudiu Firan, Tereza Iofciu, Ralf Krestel
Pages: 803-804
doi>10.1145/1835449.1835624
Full text: PDFPDF

We present an approach for answering Entity Retrieval queries using click-through information in query log data from a commercial Web search engine. We compare results using click graphs and session graphs and present an evaluation test set making use ...
expand
Feature subset non-negative matrix factorization and its applications to document understanding
Dingding Wang, Chris Ding, Tao Li
Pages: 805-806
doi>10.1145/1835449.1835625
Full text: PDFPDF

In this paper, we propose feature subset non-negative matrix factorization (NMF), which is an unsupervised approach to simultaneously cluster data points and select important features. We apply our proposed approach to various document understanding ...
expand
Learning to rank query reformulations
Van Dang, Michael Bendersky, W. Bruce Croft
Pages: 807-808
doi>10.1145/1835449.1835626
Full text: PDFPDF

Query reformulation techniques based on query logs have recently proven to be effective for web queries. However, when initial queries have reasonably good quality, these techniques are often not reliable enough to identify the helpful reformulations ...
expand
Many are better than one: improving multi-document summarization via weighted consensus
Dingding Wang, Tao Li
Pages: 809-810
doi>10.1145/1835449.1835627
Full text: PDFPDF

Given a collection of documents, various multi-document summarization methods have been proposed to generate a short summary. However, few studies have been reported on aggregating different summarization methods to possibly generate better summarization ...
expand
Exploring the use of labels to shortcut search trails
Ryen W. White, Raman Chandrasekar
Pages: 811-812
doi>10.1145/1835449.1835628
Full text: PDFPDF

Search trails comprising queries and Web page views are created as searchers engage in information-seeking activity online. During known-item search (where the objective may be to locate a target Web page), searchers may waste valuable time repeatedly ...
expand
Investigating the suboptimality and instability of pseudo-relevance feedback
Raghavendra Udupa, Abhijit Bhole
Pages: 813-814
doi>10.1145/1835449.1835629
Full text: PDFPDF

Although Pseudo-Relevance Feedback (PRF) techniques improve average retrieval performance at the price of high variance, not much is known about their optimality and the reasons for their instability. In this work, we study more than 800 topics from ...
expand
From fusion to re-ranking: a semantic approach
Annalina Caputo, Pierpaolo Basile, Giovanni Semeraro
Pages: 815-816
doi>10.1145/1835449.1835630
Full text: PDFPDF

A number of works have shown that the aggregation of several Information Retrieval (IR) systems works better than each system working individually. Nevertheless, early investigation in the context of CLEF Robust-WSD task, in which semantics is involved, ...
expand
High precision opinion retrieval using sentiment-relevance flows
Seung-Wook Lee, Jung-Tae Lee, Young-In Song, Hae-Chang Rim
Pages: 817-818
doi>10.1145/1835449.1835631
Full text: PDFPDF

Opinion retrieval involves the measuring of opinion score of a document about the given topic. We propose a new method, namely sentiment-relevance flow, that naturally unifies the topic relevance and the opinionated nature of a document. Experiments ...
expand
Ontology-enriched multi-document summarization in disaster management
Lei Li, Dingding Wang, Chao Shen, Tao Li
Pages: 819-820
doi>10.1145/1835449.1835632
Full text: PDFPDF

In this poster, we propose a novel document summarization approach named Ontology-enriched Multi-Document Summarization(OMS) for utilizing background knowledge to improve summarization results. OMS first maps the sentences of input documents onto an ...
expand
Multi-view clustering of multilingual documents
Young-Min Kim, Massih-Reza Amini, Cyril Goutte, Patrick Gallinari
Pages: 821-822
doi>10.1145/1835449.1835633
Full text: PDFPDF

We propose a new multi-view clustering method which uses clustering results obtained on each view as a voting pattern in order to construct a new set of multi-view clusters. Our experiments on a multilingual corpus of documents show that performance ...
expand
A stack decoder approach to approximate string matching
Juan M. Huerta
Pages: 823-824
doi>10.1145/1835449.1835634
Full text: PDFPDF

We present a new efficient algorithm for top-N match retrieval of sequential patterns. Our approach is based on an incremental approximation of the string edit distance using index information and a stack based search. Our approach produces hypotheses ...
expand
Late fusion of compact composite descriptors for retrieval from heterogeneous image databases
Savvas A. Chatzichristofis, Avi Arampatzis
Pages: 825-826
doi>10.1145/1835449.1835635
Full text: PDFPDF

Compact composite descriptors (CCDs) are global image features, capturing more than one types of information at the same time in a very compact representation. Their quality has so far been evaluated in retrieval from several homogeneous databases containing ...
expand
Inferring user intent in web search by exploiting social annotations
Jose M. Conde, David Vallet, Pablo Castells
Pages: 827-828
doi>10.1145/1835449.1835636
Full text: PDFPDF

In this paper, we present a folksonomy-based approach for implicit user intent extraction during a Web search process. We present a number of result re-ranking techniques based on this representation that can be applied to any Web search engine. We perform ...
expand
Query term ranking based on dependency parsing of verbose queries
Jae Hyun Park, W. Bruce Croft
Pages: 829-830
doi>10.1145/1835449.1835637
Full text: PDFPDF

Query term ranking approaches are used to select effective terms from a verbose query by ranking terms. Features used for query term ranking and selection in previous work do not consider grammatical relationships between terms. To address this issue, ...
expand
A ranking approach to target detection for automatic link generation
Jiyin He, Maarten de Rijke
Pages: 831-832
doi>10.1145/1835449.1835638
Full text: PDFPDF

We focus on the task of target detection in automatic link generation with Wikipedia, i.e., given an N-gram in a snippet of text, find the relevant Wikipedia concepts that explain or provide background knowledge for it. We formulate the task as a ranking ...
expand
Probabilistic latent maximal marginal relevance
Shengbo Guo, Scott Sanner
Pages: 833-834
doi>10.1145/1835449.1835639
Full text: PDFPDF

Diversity has been heavily motivated in the information retrieval literature as an objective criterion for result sets in search and recommender systems. Perhaps one of the most well-known and most used algorithms for result set diversification is that ...
expand
Using local precision to compare search engines in consumer health information retrieval
Carla Teixeira Lopes, Cristina Ribeiro
Pages: 835-836
doi>10.1145/1835449.1835640
Full text: PDFPDF

We have conducted a user study to evaluate several generalist and health-specific search engines on health information retrieval. Users evaluated the relevance of the top 30 documents of 4 search engines in two different health information needs. We ...
expand
multi Searcher: can we support people to get information from text they can't read or understand?
Farag Ahmed, Andreas Nürnberger
Pages: 837-838
doi>10.1145/1835449.1835641
Full text: PDFPDF

The goal of the proposed tool multi Searcher is to answer this research question: can we expect people to be able to get information from text in languages they can not read or understand? The proposed tool multi Searcher provides users with interactive ...
expand
Linking wikipedia to the web
Rianne Kaptein, Pavel Serdyukov, Jaap Kamps
Pages: 839-840
doi>10.1145/1835449.1835642
Full text: PDFPDF

We investigate the task of finding links from Wikipedia pages to external web pages. Such external links significantly extend the information in Wikipedia with information from the Web at large, while retaining the encyclopedic organization of Wikipedia. ...
expand
Short text classification in twitter to improve information filtering
Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu, Murat Demirbas
Pages: 841-842
doi>10.1145/1835449.1835643
Full text: PDFPDF

In microblogging services such as Twitter, the users may become overwhelmed by the raw data. One solution to this problem is the classification of short text messages. As short texts do not provide sufficient word occurrences, traditional classification ...
expand
A framework for BM25F-based XML retrieval
Kelly Y. Itakura, Charles L.A. Clarke
Pages: 843-844
doi>10.1145/1835449.1835644
Full text: PDFPDF

We evaluate a framework for BM25F-based XML element retrieval. The framework gathers contextual information associated with each XML element into an associated field, which we call a characteristic field. The contents of the element and the contents ...
expand
Can search systems detect users' task difficulty?: some behavioral signals
Jingjing Liu, Chang Liu, Jacek Gwizdka, Nicholas J. Belkin
Pages: 845-846
doi>10.1145/1835449.1835645
Full text: PDFPDF

In this paper, we report findings on how user behaviors vary in tasks with different difficulty levels as well as of different types. Two behavioral signals: document dwell time and number of content pages viewed per query, were found to be able to help ...
expand
Query log analysis in the context of information retrieval for children
Sergio Duarte Torres, Djoerd Hiemstra, Pavel Serdyukov
Pages: 847-848
doi>10.1145/1835449.1835646
Full text: PDFPDF

In this paper we analyze queries and sessions intended to satisfy children's information needs using a large-scale query log. The aim of this analysis is twofold: i) To identify differences between such queries and sessions, and general queries and sessions; ...
expand
Transitive history-based query disambiguation for query reformulation
Karim Filali, Anish Nair, Chris Leggetter
Pages: 849-850
doi>10.1145/1835449.1835647
Full text: PDFPDF

We present a probabilistic model of a user's search history and a target query reformulation. We derive a simple transitive similarity algorithm for disambiguating queries and improving history-based query reformulation accuracy. We compare the merits ...
expand
Using flickr geotags to predict user travel behaviour
Maarten Clements, Pavel Serdyukov, Arjen P. de Vries, Marcel J.T. Reinders
Pages: 851-852
doi>10.1145/1835449.1835648
Full text: PDFPDF

We propose a method to predict a user's favourite locations in a city, based on his Flickr geotags in other cities. We define a similarity between the geotag distributions of two users based on a Gaussian kernel convolution. The geotags of the most similar ...
expand
Metrics for assessing sets of subtopics
Filip Radlinski, Martin Szummer, Nick Craswell
Pages: 853-854
doi>10.1145/1835449.1835649
Full text: PDFPDF

To evaluate the diversity of search results, test collections have been developed that identify multiple intents for each query. Intents are the different meanings or facets that should be covered in a search results list. This means that topic development ...
expand
Learning to select rankers
Niranjan Balasubramanian, James Allan
Pages: 855-856
doi>10.1145/1835449.1835650
Full text: PDFPDF

Combining evidence from multiple retrieval models has been widely studied in the context of of distributed search, metasearch and rank fusion. Much of the prior work has focused on combining retrieval scores (or the rankings) assigned by different retrieval ...
expand
VisualSum: an interactive multi-document summarizationsystem using visualization
Yi Zhang, Dingding Wang, Tao Li
Pages: 857-858
doi>10.1145/1835449.1835651
Full text: PDFPDF

Given a collection of documents, most of existing multidocument summarization methods automatically generate a static summary for all the users. However, different users may have different opinions on the documents, thus there is a necessity for improving ...
expand
Web page publication time detection and its application for page rank
Zhumin Chen, Jun Ma, Chaoran Cui, Hongxing Rui, Shaomang Huang
Pages: 859-860
doi>10.1145/1835449.1835652
Full text: PDFPDF

Publication Time (P-time for short) of Web pages is often required in many application areas. In this paper, we address the issue of P-time detection and its application for page rank. We first propose an approach to extract P-time for a page with explicit ...
expand
HCC: a hierarchical co-clustering algorithm
Jingxuan Li, Tao Li
Pages: 861-862
doi>10.1145/1835449.1835653
Full text: PDFPDF

In this poster, we develop a novel method, called HCC, for hierarchical co-clustering. HCC brings together two interrelated but distinct themes from clustering: hierarchical clustering and co-clustering. The goal of the former theme is to organize clusters ...
expand
Retrieval system evaluation: automatic evaluation versus incomplete judgments
Claudia Hauff, Franciska de Jong
Pages: 863-864
doi>10.1145/1835449.1835654
Full text: PDFPDF

In information retrieval (IR), research aiming to reduce the cost of retrieval system evaluations has been conducted along two lines: (i) the evaluation of IR systems with reduced amounts of manual relevance assessments, and (ii) the fully automatic ...
expand
Aspect presence verification conditional on other aspects
Dmitri Roussinov
Pages: 865-866
doi>10.1145/1835449.1835655
Full text: PDFPDF

I have shown that the presence of difficult query aspects that are revealed only implicitly (e.g. exploration, opposition, achievements, cooperation, risks) can be improved by taking advantage of the known presence of other, easier to verify query aspects. ...
expand
The value of visual elements in web search
Marilyn Ostergren, Seung-yon Yu, Efthimis N. Efthimiadis
Pages: 867-868
doi>10.1145/1835449.1835656
Full text: PDFPDF

We used eye-tracking equipment to observe 36 participants as they performed three search tasks using three graphically-enhanced web search interfaces (Kartoo, SearchMe and Viewzi). In this poster we describe findings of the study focusing on how the ...
expand
Diversification of search results using webgraphs
Praveen Chandar, Ben Carterette
Pages: 869-870
doi>10.1145/1835449.1835657
Full text: PDFPDF

A set of words is often insufficient to express a user's information need. In order to account for various information needs associated with a query, diversification seems to be a reasonable strategy. By diversifying the result set, we increase the probability ...
expand
Capturing page freshness for web search
Na Dai, Brian D. Davison
Pages: 871-872
doi>10.1145/1835449.1835658
Full text: PDFPDF

Freshness has been increasingly realized by commercial search engines as an important criteria for measuring the quality of search results. However, most information retrieval methods focus on the relevance of page content to given queries without considering ...
expand
S-PLASA+: adaptive sentiment analysis with application to sales performance prediction
Yang Liu, Xiaohui Yu, Xiangji Huang, Aijun An
Pages: 873-874
doi>10.1145/1835449.1835659
Full text: PDFPDF

Analyzing the large volume of online reviews would produce useful knowledge that could be of economic values to vendors and other interested parties. In particular, the sentiments expressed in the online reviews have been shown to be strongly correlated ...
expand
Supervised query modeling using wikipedia
Edgar Meij, Maarten de Rijke
Pages: 875-876
doi>10.1145/1835449.1835660
Full text: PDFPDF

We use Wikipedia articles to semantically inform the generation of query models. To this end, we apply supervised machine learning to automatically link queries to Wikipedia articles and sample terms from the linked articles to re-estimate the query ...
expand
A two-stage model for blog feed search
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Pages: 877-878
doi>10.1145/1835449.1835661
Full text: PDFPDF

We consider blog feed search: identifying relevant blogs for a given topic. An individual's search behavior often involves a combination of exploratory behavior triggered by salient features of the information objects being examined plus goal-directed ...
expand
Machine learned ranking of entity facets
Roelof van Zwol, Lluís Garcia Pueyo, Mridul Muralidharan, Börkur Sigurbjörnsson
Pages: 879-880
doi>10.1145/1835449.1835662
Full text: PDFPDF

The research described in this paper forms the backbone of a service that enables the faceted search experience of the Yahoo! search engine. We introduce an approach for a machine learned ranking of entity facets based on user click feedback and features ...
expand
User comments for news recommendation in social media
Jia Wang, Qing Li, Yuanzhu Peter Chen
Pages: 881-882
doi>10.1145/1835449.1835663
Full text: PDFPDF

Reading and Commenting online news is becoming a common user behavior in social media. Discussion in the form of comments following news postings can be effectively facilitated if the service provider can recommend articles based on not only the original ...
expand
Incorporating global information into named entity recognition systems using relational context
Yuval Merhav, Filipe Mesquita, Denilson Barbosa, Wai Gen Yee, Ophir Frieder
Pages: 883-884
doi>10.1145/1835449.1835664
Full text: PDFPDF

The state-of-the-art in Named Entity Recognition relies on a combination of local features of the text and global knowledge to determine the types of the recognized entities. This is problematic in some cases, resulting in entities being classified as ...
expand
Achieving high accuracy retrieval using intra-document term ranking
Hyun-Wook Woo, Jung-Tae Lee, Seung-Wook Lee, Young-In Song, Hae-Chang Rim
Pages: 885-886
doi>10.1145/1835449.1835665
Full text: PDFPDF

Most traditional ranking models roughly score the relevance of a given document by observing simple term statistics, such as the occurrence of query terms within the document or within the collection. Intuitively, the relative importance of query terms ...
expand
Author interest topic model
Noriaki Kawamae
Pages: 887-888
doi>10.1145/1835449.1835666
Full text: PDFPDF

This paper presents a hierarchical topic model that simultaneously captures topics and author's interests. Our proposal, the Author Interest Topic model (AIT), introduces a latent variable with a separate probability distribution over topics into each ...
expand
On the relationship between effectiveness and accessibility
Leif Azzopardi, Richard Bache
Pages: 889-890
doi>10.1145/1835449.1835667
Full text: PDFPDF

Typically the evaluation of Information Retrieval (IR) systems is focused upon two main system attributes: efficiency and effectiveness. However, it has been argued that it is also important to consider accessibility, i.e. the extent to which the IR ...
expand
Visual concept-based selection of query expansions for spoken content retrieval
Stevan Rudinac, Martha Larson, Alan Hanjalic
Pages: 891-892
doi>10.1145/1835449.1835668
Full text: PDFPDF

In this paper we present a novel approach to semantic-theme-based video retrieval that considers entire videos as retrieval units and exploits automatically detected visual concepts to improve the results of retrieval based on spoken content. We deploy ...
expand
Mining adjacent markets from a large-scale ads video collection for image advertising
Guwen Feng, Xin-Jing Wang, Lei Zhang, Wei-Ying Ma
Pages: 893-894
doi>10.1145/1835449.1835669
Full text: PDFPDF

The research on image advertising is still in its infancy. Most previous approaches suggest ads by directly matching an ad to a query image, which lacks the power to identify ads from adjacent market. In this paper, we tackle the problem by mining knowledge ...
expand
A co-learning framework for learning user search intents from rule-generated training data
Jun Yan, Zeyu Zheng, Li Jiang, Yan Li, Shuicheng Yan, Zheng Chen
Pages: 895-896
doi>10.1145/1835449.1835670
Full text: PDFPDF

Learning to understand user search intents from their online behaviors is crucial for both Web search and online advertising. However, it is a challenging task to collect and label a sufficient amount of high quality training data for various user intents ...
expand
Learning the click-through rate for rare/new ads from similar ads
Kushal S. Dave, Vasudeva Varma
Pages: 897-898
doi>10.1145/1835449.1835671
Full text: PDFPDF

Ads on the search engine (SE) are generally ranked based on their Click-through rates (CTR). Hence, accurately predicting the CTR of an ad is of paramount importance for maximizing the SE's revenue. We present a model that inherits the click information ...
expand
Graphical models for text: a new paradigm for text representation and processing
Charu Aggarwal, Peixiang Zhao
Pages: 899-900
doi>10.1145/1835449.1835672
Full text: PDFPDF

Almost all text applications use the well known vector-space model for text representation and analysis. While the vector-space model has proven itself to be an effective and efficient representation for mining purposes, it does not preserve information ...
expand
A survival modeling approach to biomedical search result diversification using wikipedia
Xiaoshi Yin, Jimmy Xiangji Huang, Xiaofeng Zhou, Zhoujun Li
Pages: 901-902
doi>10.1145/1835449.1835673
Full text: PDFPDF

In this paper, we propose a probabilistic survival model derived from the survival analysis theory for measuring aspect novelty. The retrieved documents' query-relevance and novelty are combined at the aspect level for re-ranking. Experiments conducted ...
expand
TUTORIAL SESSION: Tutorials
Low cost evaluation in information retrieval
Ben Carterette, Evangelos Kanoulas, Emine Yilmaz
Pages: 903-903
doi>10.1145/1835449.1835675
Full text: PDFPDF

Search corpora are growing larger and larger: over the last 10 years, the IR research community has moved from the several hundred thousand documents on the TREC disks to the tens of millions of U.S. government web pages of GOV2 to the one billion general-interest ...
expand
Learning to rank for information retrieval
Tie-Yan Liu
Pages: 904-904
doi>10.1145/1835449.1835676
Full text: PDFPDF

This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, ...
expand
Introduction to probabilistic models in IR
Victor P. Lavrenko
Pages: 905-905
doi>10.1145/1835449.1835677
Full text: PDFPDF

Most of today's state-of-the-art retrieval models, including BM25 and language modeling, are grounded in probabilistic principles. Having a working understanding of these principles can help researchers understand existing retrieval models better and ...
expand
Multimedia information retrieval
Stefan Rueger
Pages: 906-906
doi>10.1145/1835449.1835678
Full text: PDFPDF

This tutorial is concerned with creating the best possible multimedia search experience. The intriguing bit here is that the query itself can be a multimedia excerpt: For example, when you walk around in an unknown place and stumble across an interesting ...
expand
Web retrieval: the role of users
Ricardo Baeza-Yates, Yoelle Maarek
Pages: 907-907
doi>10.1145/1835449.1835679
Full text: PDFPDF

Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis ...
expand
Information retrieval challenges in computational advertising
Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski
Pages: 908-908
doi>10.1145/1835449.1835680
Full text: PDFPDF

Computational advertising is an emerging scientific sub-discipline, at the intersection of large scale search and text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The central ...
expand
Extraction of open-domain class attributes from text: building blocks for faceted search
Marius Pasca
Pages: 909-909
doi>10.1145/1835449.1835681
Full text: PDFPDF

Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., "top speed", "body style" and "number of cylinders" for the class of "sports cars") ...
expand
From federated to aggregated search
Fernando Diaz, Mounia Lalmas, Milad Shokouhi
Pages: 910-910
doi>10.1145/1835449.1835682
Full text: PDFPDF

Federated search refers to the brokered retrieval of content from a set of auxiliary retrieval systems instead of from a single, centralized retrieval system. Federated search tasks occur in, for example, digital libraries (where documents from several ...
expand
Estimating the query difficulty for information retrieval
David Carmel, Elad Yom-Tov
Pages: 911-911
doi>10.1145/1835449.1835683
Full text: PDFPDF

Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is ...
expand
Search and browse log mining for web information retrieval: challenges, methods, and applications
Daxin Jiang, Jian Pei, Hang Li
Pages: 912-912
doi>10.1145/1835449.1835684
Full text: PDFPDF

Huge amounts of search log data have been accumulated in various search engines. Currently, a commercial search engine receives billions of queries and collects tera-bytes of log data on any single day. Other than search log data, browse logs can be ...
expand
Information retrieval for e-discovery
David D. Lewis
Pages: 913-913
doi>10.1145/1835449.1835685
Full text: PDFPDF

Discovery, the process under which parties to legal cases must reveal documents relevant to the disputed issues is a core aspect of trials in the United States, and a lesser but important factor in other countries. Discovery on documents stored in computerized ...
expand
SESSION: Doctoral consortium
On the mono- and cross-language detection of text reuse and plagiarism
Alberto Barrón-Cedeño
Pages: 914-914
doi>10.1145/1835449.1835687
Full text: PDFPDF

Plagiarism, the unacknowledged reuse of text, has increased in recent years due to the large amount of texts readily available. For instance, recent studies claim that nowadays a high rate of student reports include plagiarism, making manual plagiarism ...
expand
User interface designs to support the social transfer of web search expertise
Neema Moraveji
Pages: 915-915
doi>10.1145/1835449.1835688
Full text: PDFPDF

While there are many ways to develop search expertise, I maintain that most members of the general public do so in an inefficient manner. One reason is that, with current tools, is difficult to observe experts as a means of acquiring search expertise ...
expand
Leveraging user interaction and collaboration for improving multilingual information access in digital libraries
Juliane Stiller
Pages: 916-916
doi>10.1145/1835449.1835689
Full text: PDFPDF

The goal of interactive cross-lingual information retrieval systems is to support users in formulating effective queries and selecting the documents which satisfy their information needs regardless of the language of these documents. This dissertation ...
expand
Entity information management in complex networks
Yi Fang
Pages: 917-917
doi>10.1145/1835449.1835690
Full text: PDFPDF

Entity information management (EIM) deals with organizing, processing and delivering information about entities. Its emergence is a result of satisfying more sophisticated information needs that go beyond document search. In the recent years, entity ...
expand
Finding people and their utterances in social media
Wouter Weerkamp
Pages: 918-918
doi>10.1145/1835449.1835691
Full text: PDFPDF

Since its introduction, social media, "a group of internet-based applications that (...) allow the creation and exchange of user generated content" [1], has attracted more and more users. Over the years, many platforms have arisen that allow users to ...
expand
Leveraging user-generated content for news search
Richard M.C. McCreadie
Pages: 919-919
doi>10.1145/1835449.1835692
Full text: PDFPDF

Over the last few years both availability and accessibility of current news stories on the Web have dramatically improved. In particular, users can now access news from a variety of sources hosted on the Web, from newswire presences such as the New York ...
expand
User centered story tracking
Ilija Subasic
Pages: 920-920
doi>10.1145/1835449.1835693
Full text: PDFPDF

Using data collections available on the Internet has for many people became the main medium for staying informed about the world. Many of these collections are in nature dynamic, evolving as the subjects they describe change. The goal of different research ...
expand
Reverse annotation based retrieval from large document image collections
Pramod Sankar K.
Pages: 921-921
doi>10.1145/1835449.1835694
Full text: PDFPDF

A number of projects are dedicated to creating digital libraries from scanned books, such as Google Books, UDL, Digital Library of India (DLI), etc. The ability to search in the content of document images is essential for the usability and popularity ...
expand
Learning hidden variable models for blog retrieval
Mengqiu Wang
Pages: 922-922
doi>10.1145/1835449.1835695
Full text: PDFPDF

We describe probabilistic models that leverage individual blog post evidence to improve blog seed retrieval performances. Our model offers a intuitive and principled method to combine multiple posts in scoring a whole blog site by treating individual ...
expand
Investigation on smoothing and aggregation methods in blog retrieval
Mostafa Keikha
Pages: 923-923
doi>10.1145/1835449.1835696
Full text: PDFPDF

Recently, user generated data is growing rapidly and becoming one of the most important source of information in the web. Blogosphere (the collection of blogs on the web) is one of the main source of information in this category. In my work for my PhD, ...
expand
Aiming for user experience in information retrieval: towards user-centered relevance (UCR)
Frans van der Sluis, Betsy. van Dijk, Egon L. van den Broek
Pages: 924-924
doi>10.1145/1835449.1835697
Full text: PDFPDF

Powered by The ACM Guide to Computing Literature





The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us